2024-08-19 16:40:02,064 INFO [train_multi_KD3.py:1188] (1/4) Training started 2024-08-19 16:40:02,064 INFO [train_multi_KD3.py:1198] (1/4) Device: cuda:1 2024-08-19 16:40:02,064 INFO [train_multi_KD3.py:1214] (1/4) Using dtype=torch.bfloat16 2024-08-19 16:40:02,064 INFO [train_multi_KD3.py:1216] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '3210a8ed-dirty', 'icefall-git-date': 'Mon Aug 19 16:16:48 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-19 16:40:02,064 INFO [train_multi_KD3.py:1218] (1/4) About to create model 2024-08-19 16:40:02,408 INFO [model_shift.py:142] (1/4) Delta_t: 6 when computing the distillation loss 2024-08-19 16:40:02,412 INFO [train_multi_KD3.py:1222] (1/4) Number of model parameters: 66484678 2024-08-19 16:40:02,412 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-30.pt 2024-08-19 16:40:04,659 INFO [train_multi_KD3.py:1237] (1/4) Using DDP 2024-08-19 16:40:06,066 INFO [train_multi_KD3.py:1249] (1/4) Loading optimizer state dict 2024-08-19 16:40:06,302 INFO [train_multi_KD3.py:1257] (1/4) Loading scheduler state dict 2024-08-19 16:40:06,302 INFO [kd_datamodule.py:690] (1/4) About to get train 960 cuts 2024-08-19 16:40:06,345 INFO [kd_datamodule.py:862] (1/4) About to get the voxceleb cuts. 2024-08-19 16:40:06,346 INFO [kd_datamodule.py:873] (1/4) Adding voxceleb2 cuts. 2024-08-19 16:40:06,348 INFO [train_multi_KD3.py:1320] (1/4) Getting audioset cuts 2024-08-19 16:40:06,348 INFO [kd_datamodule.py:881] (1/4) About to get the audioset cuts for KD. 2024-08-19 16:40:06,350 INFO [train_multi_KD3.py:1326] (1/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-19 16:40:14,312 INFO [train_multi_KD3.py:1328] (1/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1187704) [underlying data type: ], CutSet(len=1904746) [underlying data type: ]] 2024-08-19 16:40:14,312 INFO [train_multi_KD3.py:1329] (1/4) Using weights: [1406195, 1187704, 1904746] 2024-08-19 16:40:14,312 INFO [train_multi_KD3.py:1338] (1/4) CutSet(len=4498645) [underlying data type: ] 2024-08-19 16:40:14,312 INFO [kd_datamodule.py:449] (1/4) Disable MUSAN 2024-08-19 16:40:14,312 INFO [kd_datamodule.py:489] (1/4) Disable SpecAugment 2024-08-19 16:40:14,312 INFO [kd_datamodule.py:491] (1/4) About to create train dataset 2024-08-19 16:40:14,313 INFO [kd_datamodule.py:528] (1/4) Using SimpleCutSampler 2024-08-19 16:40:14,313 INFO [kd_datamodule.py:536] (1/4) About to create train dataloader 2024-08-19 16:40:14,314 INFO [kd_datamodule.py:756] (1/4) About to get dev-clean cuts 2024-08-19 16:40:14,315 INFO [kd_datamodule.py:774] (1/4) About to get dev-other cuts 2024-08-19 16:40:14,317 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-19 16:40:14,596 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-19 16:40:14,597 INFO [kd_datamodule.py:833] (1/4) About to get the test set of voxceleb1 set. 2024-08-19 16:40:14,597 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-19 16:40:14,828 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-19 16:40:14,828 INFO [kd_datamodule.py:893] (1/4) About to get the audioset eval cuts. 2024-08-19 16:40:14,830 INFO [kd_datamodule.py:570] (1/4) About to create dev dataset 2024-08-19 16:40:15,330 INFO [kd_datamodule.py:591] (1/4) About to create dev dataloader 2024-08-19 16:40:15,330 INFO [train_multi_KD3.py:1418] (1/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-19 16:40:15,330 INFO [train_multi_KD3.py:1422] (1/4) Loading grad scaler state dict 2024-08-19 16:40:31,713 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 0, loss[loss=0.1309, beats_loss=0.008062, ecapa_loss=0.0001364, whisper_loss=0.1215, over 23139.00 frames. ], tot_loss[loss=0.1309, beats_loss=0.008062, ecapa_loss=0.0001364, whisper_loss=0.1215, over 23139.00 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:40:31,713 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-19 16:41:05,598 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2478, over 931116.00 frames. 2024-08-19 16:41:25,357 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on SV_voxceleb1: loss=0.003992, beats_loss=0, ecapa_loss=0.0003992, whisper_loss=0, over 944235.00 frames. 2024-08-19 16:42:59,782 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 16:42:59,783 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-19 16:43:00,142 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:43:01,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2024-08-19 16:43:04,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-08-19 16:43:07,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-19 16:43:17,726 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 16:43:57,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4446090.0, ans=0.1 2024-08-19 16:44:22,638 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 16:44:25,185 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 16:44:36,549 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 16:44:43,280 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.746e+01 3.133e+01 8.282e+01, threshold=5.492e+01, percent-clipped=1.0 2024-08-19 16:44:43,621 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 16:44:48,646 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:44:50,208 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 23 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-19 16:45:00,510 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 50, loss[loss=0.09177, beats_loss=0.008582, ecapa_loss=0.0001196, whisper_loss=0.08199, over 17351.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009152, ecapa_loss=0.0001485, whisper_loss=0.08979, over 871495.92 frames. ], batch size: 63, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:45:11,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4446390.0, ans=0.125 2024-08-19 16:45:18,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4446390.0, ans=0.0 2024-08-19 16:45:25,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4446490.0, ans=0.125 2024-08-19 16:46:17,878 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 16:46:18,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4446690.0, ans=0.1 2024-08-19 16:46:49,472 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-19 16:46:55,261 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 100, loss[loss=0.0987, beats_loss=0.009169, ecapa_loss=0.0001305, whisper_loss=0.08822, over 15725.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009043, ecapa_loss=0.0001446, whisper_loss=0.09102, over 1509185.71 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:47:02,758 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 16:48:23,820 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 16:48:24,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.615e+01 2.787e+01 3.101e+01 5.493e+01, threshold=5.575e+01, percent-clipped=1.0 2024-08-19 16:48:40,686 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 150, loss[loss=0.08488, beats_loss=0.01075, ecapa_loss=0.0001716, whisper_loss=0.07241, over 19505.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009121, ecapa_loss=0.0001423, whisper_loss=0.09132, over 2053104.12 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:48:49,230 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.725e+05 2024-08-19 16:48:55,669 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 16:49:35,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4447690.0, ans=0.125 2024-08-19 16:49:54,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4447790.0, ans=0.0 2024-08-19 16:50:13,649 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 200, loss[loss=0.09455, beats_loss=0.007243, ecapa_loss=0.0001584, whisper_loss=0.08572, over 19705.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009348, ecapa_loss=0.0001406, whisper_loss=0.09155, over 2428570.59 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:50:20,378 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 32 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 16:50:20,845 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-19 16:50:22,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4447890.0, ans=0.0 2024-08-19 16:50:40,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4447990.0, ans=0.0 2024-08-19 16:50:59,880 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-19 16:51:04,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4448190.0, ans=0.1 2024-08-19 16:51:06,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4448190.0, ans=0.1 2024-08-19 16:51:19,251 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 32 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 16:51:24,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.343e+01 2.586e+01 2.857e+01 5.487e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-19 16:51:34,812 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 18 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-19 16:51:38,000 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 250, loss[loss=0.1069, beats_loss=0.00933, ecapa_loss=0.0001664, whisper_loss=0.09587, over 16499.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.009614, ecapa_loss=0.0001394, whisper_loss=0.09163, over 2718741.05 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:51:38,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4448390.0, ans=0.125 2024-08-19 16:51:49,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2024-08-19 16:51:50,054 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 16:52:01,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4448490.0, ans=0.125 2024-08-19 16:52:29,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4448690.0, ans=0.125 2024-08-19 16:52:32,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 16:53:03,188 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 300, loss[loss=0.1056, beats_loss=0.01202, ecapa_loss=0.0001048, whisper_loss=0.09254, over 18448.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.009872, ecapa_loss=0.0001406, whisper_loss=0.09181, over 2951380.89 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:53:18,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4448990.0, ans=0.05 2024-08-19 16:53:19,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4448990.0, ans=0.1 2024-08-19 16:53:34,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2024-08-19 16:53:40,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4449090.0, ans=0.0 2024-08-19 16:53:42,135 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.67 vs. limit=22.5 2024-08-19 16:53:47,787 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-19 16:54:10,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4449290.0, ans=0.125 2024-08-19 16:54:11,288 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.292e+01 2.578e+01 2.922e+01 3.653e+02, threshold=5.156e+01, percent-clipped=3.0 2024-08-19 16:54:11,523 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 16:54:23,844 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 350, loss[loss=0.1049, beats_loss=0.01109, ecapa_loss=0.0001552, whisper_loss=0.09227, over 19749.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.009966, ecapa_loss=0.0001411, whisper_loss=0.09179, over 3129286.98 frames. ], batch size: 81, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:54:27,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4449390.0, ans=0.0 2024-08-19 16:54:43,248 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5 2024-08-19 16:55:43,244 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 400, loss[loss=0.07551, beats_loss=0.01182, ecapa_loss=0.0001172, whisper_loss=0.06252, over 15468.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01013, ecapa_loss=0.0001417, whisper_loss=0.0909, over 3244305.73 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:55:45,203 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 13 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 16:55:48,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4449890.0, ans=0.0 2024-08-19 16:55:53,826 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 12 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 16:55:58,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4449990.0, ans=0.0 2024-08-19 16:56:06,519 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 22 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 16:56:19,354 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 16:56:27,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4450090.0, ans=0.07 2024-08-19 16:56:39,781 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 16:56:44,552 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 16:56:46,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4450190.0, ans=0.05 2024-08-19 16:56:52,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.167e+01 2.459e+01 2.674e+01 3.849e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 16:57:03,516 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08908264338970184, model_norm_threshold=49.18006896972656 2024-08-19 16:57:03,673 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.931e+04, grad_sumsq=3.931e+04, orig_rms_sq=1.000e+00 2024-08-19 16:57:05,402 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 450, loss[loss=0.1122, beats_loss=0.008934, ecapa_loss=0.0001635, whisper_loss=0.1017, over 15364.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01022, ecapa_loss=0.0001413, whisper_loss=0.09019, over 3381432.28 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:57:48,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4450590.0, ans=10.0 2024-08-19 16:57:50,598 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-19 16:58:02,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4450690.0, ans=0.125 2024-08-19 16:58:21,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4450790.0, ans=0.125 2024-08-19 16:58:26,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4450890.0, ans=0.0 2024-08-19 16:58:28,190 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 500, loss[loss=0.09277, beats_loss=0.01309, ecapa_loss=0.0001196, whisper_loss=0.07848, over 18333.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0103, ecapa_loss=0.0001406, whisper_loss=0.0895, over 3486896.13 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:58:36,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2024-08-19 16:59:05,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4451090.0, ans=0.1 2024-08-19 16:59:20,767 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 16:59:20,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4451190.0, ans=0.125 2024-08-19 16:59:28,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4451190.0, ans=0.125 2024-08-19 16:59:36,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.363e+01 2.706e+01 3.045e+01 5.521e+02, threshold=5.412e+01, percent-clipped=1.0 2024-08-19 16:59:49,869 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 550, loss[loss=0.1092, beats_loss=0.009488, ecapa_loss=0.0001557, whisper_loss=0.09818, over 20142.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001411, whisper_loss=0.08936, over 3542218.89 frames. ], batch size: 79, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:59:50,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4451390.0, ans=0.125 2024-08-19 16:59:51,496 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 15 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 17:00:28,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4451490.0, ans=0.1 2024-08-19 17:00:50,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4451690.0, ans=0.1 2024-08-19 17:00:55,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4451690.0, ans=0.125 2024-08-19 17:00:55,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4451690.0, ans=0.1 2024-08-19 17:00:59,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4451690.0, ans=0.125 2024-08-19 17:01:13,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4451790.0, ans=0.0 2024-08-19 17:01:16,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-19 17:01:17,319 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-19 17:01:26,638 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2024-08-19 17:01:27,123 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 600, loss[loss=0.07184, beats_loss=0.01298, ecapa_loss=0.0001019, whisper_loss=0.05784, over 16272.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0103, ecapa_loss=0.000141, whisper_loss=0.08927, over 3602145.52 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:02:02,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4452090.0, ans=0.0 2024-08-19 17:02:04,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-08-19 17:02:13,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4452090.0, ans=0.125 2024-08-19 17:02:15,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4452090.0, ans=0.0 2024-08-19 17:02:22,118 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 17:02:23,699 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 17:02:38,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.243e+01 2.476e+01 2.748e+01 6.280e+01, threshold=4.953e+01, percent-clipped=2.0 2024-08-19 17:02:39,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4452290.0, ans=0.0 2024-08-19 17:02:53,185 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 650, loss[loss=0.1054, beats_loss=0.009913, ecapa_loss=0.0001476, whisper_loss=0.09401, over 22261.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01018, ecapa_loss=0.0001418, whisper_loss=0.09002, over 3645721.19 frames. ], batch size: 86, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:03:11,090 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 17:03:18,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4452490.0, ans=0.125 2024-08-19 17:03:18,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=15.0 2024-08-19 17:03:19,560 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 17:03:36,775 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 17:03:40,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4452590.0, ans=0.1 2024-08-19 17:03:40,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4452590.0, ans=0.0 2024-08-19 17:03:55,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4452690.0, ans=0.0 2024-08-19 17:03:58,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-19 17:04:03,491 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 17:04:08,683 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2024-08-19 17:04:17,929 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 700, loss[loss=0.112, beats_loss=0.007587, ecapa_loss=0.0001219, whisper_loss=0.1032, over 15694.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01023, ecapa_loss=0.0001403, whisper_loss=0.08982, over 3659850.52 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:04:54,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2024-08-19 17:05:11,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-19 17:05:27,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.284e+01 2.551e+01 2.853e+01 6.068e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-19 17:05:41,201 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 750, loss[loss=0.1027, beats_loss=0.01102, ecapa_loss=0.0001178, whisper_loss=0.09045, over 23122.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01025, ecapa_loss=0.00014, whisper_loss=0.08924, over 3654577.04 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:05:46,760 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:06:01,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4453490.0, ans=0.125 2024-08-19 17:06:05,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4453490.0, ans=0.0 2024-08-19 17:06:10,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4453490.0, ans=0.125 2024-08-19 17:06:18,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4453590.0, ans=0.125 2024-08-19 17:07:07,256 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 800, loss[loss=0.09106, beats_loss=0.01062, ecapa_loss=0.0001339, whisper_loss=0.0791, over 16864.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01027, ecapa_loss=0.0001401, whisper_loss=0.08933, over 3671571.22 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:07:09,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4453890.0, ans=0.125 2024-08-19 17:07:31,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4453990.0, ans=0.1 2024-08-19 17:07:37,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4453990.0, ans=0.0 2024-08-19 17:07:38,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4453990.0, ans=0.0 2024-08-19 17:07:45,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4454090.0, ans=0.0 2024-08-19 17:07:58,666 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 17:08:19,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.286e+01 2.525e+01 2.905e+01 4.318e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 17:08:21,100 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 17:08:22,874 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-19 17:08:23,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4454290.0, ans=0.0 2024-08-19 17:08:26,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4454290.0, ans=0.125 2024-08-19 17:08:32,938 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 850, loss[loss=0.1097, beats_loss=0.009653, ecapa_loss=0.0001433, whisper_loss=0.0986, over 23565.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01037, ecapa_loss=0.0001383, whisper_loss=0.08876, over 3682788.79 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:08:56,915 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 17:09:23,467 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.340e-02 2024-08-19 17:09:54,282 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 17:09:59,474 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 900, loss[loss=0.1049, beats_loss=0.01139, ecapa_loss=0.0001333, whisper_loss=0.09219, over 21010.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.0888, over 3725078.18 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:10:13,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4454890.0, ans=0.125 2024-08-19 17:10:21,926 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 17:10:29,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4454990.0, ans=0.2 2024-08-19 17:11:00,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=12.0 2024-08-19 17:11:04,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4455190.0, ans=0.0 2024-08-19 17:11:06,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4455290.0, ans=0.0 2024-08-19 17:11:06,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4455290.0, ans=0.2 2024-08-19 17:11:11,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.270e+01 2.580e+01 3.222e+01 2.488e+02, threshold=5.161e+01, percent-clipped=3.0 2024-08-19 17:11:24,785 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 950, loss[loss=0.1023, beats_loss=0.009375, ecapa_loss=0.0001317, whisper_loss=0.09161, over 16317.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001398, whisper_loss=0.08923, over 3726900.13 frames. ], batch size: 63, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:11:25,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4455390.0, ans=0.04949747468305833 2024-08-19 17:11:29,713 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 17 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 17:11:39,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4455490.0, ans=0.2 2024-08-19 17:11:56,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 17:11:58,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 17:12:05,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4455590.0, ans=0.125 2024-08-19 17:12:11,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 17:12:40,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4455790.0, ans=0.1 2024-08-19 17:12:42,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4455790.0, ans=0.125 2024-08-19 17:12:49,075 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1000, loss[loss=0.1028, beats_loss=0.01159, ecapa_loss=0.0001313, whisper_loss=0.08994, over 23269.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.08905, over 3735461.39 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:12:49,229 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 23 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-19 17:13:16,561 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 17:13:28,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4456090.0, ans=0.125 2024-08-19 17:13:59,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.231e+01 2.578e+01 2.915e+01 8.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 17:14:03,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4456290.0, ans=0.125 2024-08-19 17:14:05,524 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-19 17:14:11,307 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 17:14:12,623 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1050, loss[loss=0.1072, beats_loss=0.01194, ecapa_loss=0.0001139, whisper_loss=0.09411, over 16171.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01031, ecapa_loss=0.0001395, whisper_loss=0.08915, over 3726576.83 frames. ], batch size: 61, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:14:16,636 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 17:14:18,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4456390.0, ans=0.2 2024-08-19 17:14:31,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4456490.0, ans=0.125 2024-08-19 17:14:46,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4456590.0, ans=0.125 2024-08-19 17:14:47,394 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 17:14:52,884 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 17:14:54,777 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 17:15:12,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4456690.0, ans=0.125 2024-08-19 17:15:37,030 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1100, loss[loss=0.1208, beats_loss=0.009614, ecapa_loss=0.0001464, whisper_loss=0.1097, over 22008.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01034, ecapa_loss=0.0001395, whisper_loss=0.08852, over 3699114.51 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:15:43,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4456890.0, ans=0.2 2024-08-19 17:15:44,374 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 17:15:47,867 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 17:16:10,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4457090.0, ans=0.0 2024-08-19 17:16:28,729 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 17:16:34,055 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 17:16:34,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4457190.0, ans=0.05 2024-08-19 17:16:39,170 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 17:16:48,536 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.222e+01 2.508e+01 2.804e+01 3.305e+02, threshold=5.015e+01, percent-clipped=2.0 2024-08-19 17:16:49,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4457290.0, ans=0.04949747468305833 2024-08-19 17:17:00,396 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 30 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 17:17:01,924 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1150, loss[loss=0.1155, beats_loss=0.007326, ecapa_loss=0.0001652, whisper_loss=0.1066, over 19120.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01027, ecapa_loss=0.0001398, whisper_loss=0.08869, over 3728567.81 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:17:12,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4457390.0, ans=0.0 2024-08-19 17:17:15,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4457390.0, ans=0.1 2024-08-19 17:17:28,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4457490.0, ans=0.125 2024-08-19 17:17:30,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-19 17:17:31,335 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 17:17:38,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4457590.0, ans=0.1 2024-08-19 17:17:46,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4457590.0, ans=10.0 2024-08-19 17:17:58,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4457690.0, ans=0.1 2024-08-19 17:18:18,043 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.125e+05 2024-08-19 17:18:26,403 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1200, loss[loss=0.06772, beats_loss=0.01196, ecapa_loss=0.0001407, whisper_loss=0.05435, over 16330.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.08829, over 3701437.27 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:18:26,921 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 23 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-19 17:18:29,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4457890.0, ans=0.125 2024-08-19 17:18:51,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4457990.0, ans=0.2 2024-08-19 17:18:53,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4457990.0, ans=0.125 2024-08-19 17:19:19,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4458190.0, ans=0.2 2024-08-19 17:19:30,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4458190.0, ans=0.0 2024-08-19 17:19:33,091 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 13 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 17:19:36,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.677e+01 2.243e+01 2.469e+01 2.791e+01 3.736e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-19 17:19:50,490 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1250, loss[loss=0.06097, beats_loss=0.01177, ecapa_loss=0.0001605, whisper_loss=0.0476, over 12357.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01039, ecapa_loss=0.0001387, whisper_loss=0.0881, over 3720420.22 frames. ], batch size: 51, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:20:02,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4458390.0, ans=0.125 2024-08-19 17:20:12,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4458490.0, ans=0.0 2024-08-19 17:20:15,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4458490.0, ans=0.125 2024-08-19 17:20:28,558 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 17:20:46,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4458690.0, ans=0.035 2024-08-19 17:21:00,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4458790.0, ans=10.0 2024-08-19 17:21:15,038 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1300, loss[loss=0.1043, beats_loss=0.009145, ecapa_loss=0.0001383, whisper_loss=0.09378, over 20600.00 frames. ], tot_loss[loss=0.09981, beats_loss=0.01034, ecapa_loss=0.0001396, whisper_loss=0.08807, over 3739182.12 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:21:20,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4458890.0, ans=0.125 2024-08-19 17:21:23,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4458890.0, ans=0.0 2024-08-19 17:21:31,627 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 17:21:31,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4458990.0, ans=0.1 2024-08-19 17:21:36,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4458990.0, ans=0.125 2024-08-19 17:21:41,285 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-19 17:21:51,327 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 17:21:59,871 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 17 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-19 17:22:11,133 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 17:22:16,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4459190.0, ans=0.2 2024-08-19 17:22:21,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.64 vs. limit=15.0 2024-08-19 17:22:23,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4459290.0, ans=0.1 2024-08-19 17:22:23,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.156e+01 2.325e+01 2.640e+01 4.207e+01, threshold=4.651e+01, percent-clipped=0.0 2024-08-19 17:22:37,395 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1350, loss[loss=0.08716, beats_loss=0.01026, ecapa_loss=0.000106, whisper_loss=0.07585, over 15028.00 frames. ], tot_loss[loss=0.09967, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.08792, over 3731724.40 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:22:48,269 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-08-19 17:23:19,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2024-08-19 17:23:43,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:56,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:59,857 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1400, loss[loss=0.09783, beats_loss=0.008493, ecapa_loss=0.0001567, whisper_loss=0.08777, over 20091.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01028, ecapa_loss=0.00014, whisper_loss=0.08899, over 3762859.11 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:24:15,458 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 17:24:57,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4460190.0, ans=0.125 2024-08-19 17:24:59,964 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4460190.0, ans=0.2 2024-08-19 17:25:03,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4460190.0, ans=0.0 2024-08-19 17:25:04,689 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 24 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-19 17:25:11,243 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.246e+01 2.446e+01 2.816e+01 8.915e+01, threshold=4.891e+01, percent-clipped=2.0 2024-08-19 17:25:21,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4460290.0, ans=0.0 2024-08-19 17:25:24,147 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1450, loss[loss=0.08675, beats_loss=0.01171, ecapa_loss=0.0001084, whisper_loss=0.07396, over 14165.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01023, ecapa_loss=0.00014, whisper_loss=0.08863, over 3733865.76 frames. ], batch size: 55, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:25:33,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-19 17:25:34,302 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 17:25:46,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4460490.0, ans=0.0 2024-08-19 17:25:46,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-19 17:26:03,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4460590.0, ans=0.2 2024-08-19 17:26:35,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4460790.0, ans=0.125 2024-08-19 17:26:43,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4460790.0, ans=0.0 2024-08-19 17:26:46,814 WARNING [optim.py:496] (1/4) Scaling gradients by 0.051883164793252945, model_norm_threshold=48.91460418701172 2024-08-19 17:26:46,971 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.482e+04, grad_sumsq=2.578e+04, orig_rms_sq=3.290e+00 2024-08-19 17:26:53,313 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1500, loss[loss=0.08475, beats_loss=0.009346, ecapa_loss=0.0001577, whisper_loss=0.07382, over 14684.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01028, ecapa_loss=0.0001384, whisper_loss=0.08853, over 3731533.15 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:26:53,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4460890.0, ans=0.125 2024-08-19 17:27:06,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4460890.0, ans=0.1 2024-08-19 17:27:07,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4460890.0, ans=0.125 2024-08-19 17:27:10,735 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-19 17:27:29,561 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 27 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 17:27:41,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4461090.0, ans=0.0 2024-08-19 17:27:43,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4461090.0, ans=10.0 2024-08-19 17:28:09,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.242e+01 2.515e+01 2.868e+01 9.428e+02, threshold=5.031e+01, percent-clipped=1.0 2024-08-19 17:28:09,787 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 17:28:16,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2024-08-19 17:28:22,810 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1550, loss[loss=0.1072, beats_loss=0.008388, ecapa_loss=0.0001445, whisper_loss=0.09739, over 19046.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01029, ecapa_loss=0.0001393, whisper_loss=0.08808, over 3705416.70 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:28:37,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461390.0, ans=0.1 2024-08-19 17:28:37,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4461390.0, ans=0.09899494936611666 2024-08-19 17:28:44,431 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 17:29:00,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-19 17:29:10,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4461590.0, ans=0.0 2024-08-19 17:29:13,924 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 17:29:26,720 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-19 17:29:50,173 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1600, loss[loss=0.1167, beats_loss=0.01106, ecapa_loss=9.321e-05, whisper_loss=0.1047, over 22688.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.08828, over 3700746.62 frames. ], batch size: 81, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:30:01,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4461890.0, ans=0.125 2024-08-19 17:30:06,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-19 17:30:20,818 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 16 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 17:30:32,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4462090.0, ans=0.125 2024-08-19 17:30:37,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4462090.0, ans=0.1 2024-08-19 17:30:43,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4462190.0, ans=0.1 2024-08-19 17:31:02,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4462290.0, ans=0.125 2024-08-19 17:31:03,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.243e+01 2.444e+01 2.645e+01 4.310e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-19 17:31:09,566 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 17:31:16,377 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 17:31:17,910 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1650, loss[loss=0.08964, beats_loss=0.01204, ecapa_loss=0.0001608, whisper_loss=0.07599, over 16944.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.0104, ecapa_loss=0.000138, whisper_loss=0.08817, over 3724665.95 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:31:43,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4462490.0, ans=6.0 2024-08-19 17:31:46,629 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 17:31:58,842 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 17:32:08,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=12.0 2024-08-19 17:32:17,464 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 17:32:27,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-19 17:32:31,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4462790.0, ans=0.125 2024-08-19 17:32:31,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-19 17:32:38,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4462790.0, ans=0.125 2024-08-19 17:32:43,017 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1700, loss[loss=0.1062, beats_loss=0.01032, ecapa_loss=0.0001188, whisper_loss=0.09465, over 15804.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.08863, over 3729830.05 frames. ], batch size: 61, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:32:52,086 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 28 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-19 17:32:52,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4462890.0, ans=0.0 2024-08-19 17:33:30,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4463090.0, ans=0.0 2024-08-19 17:33:33,327 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 17:33:34,617 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 17:33:54,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.265e+01 2.450e+01 2.804e+01 4.783e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-19 17:34:02,253 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 17:34:02,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4463290.0, ans=0.2 2024-08-19 17:34:08,170 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1750, loss[loss=0.08426, beats_loss=0.01195, ecapa_loss=0.0001299, whisper_loss=0.07101, over 20991.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01033, ecapa_loss=0.0001371, whisper_loss=0.08865, over 3749593.95 frames. ], batch size: 85, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:34:12,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4463390.0, ans=0.0 2024-08-19 17:34:15,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4463390.0, ans=0.025 2024-08-19 17:34:27,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4463490.0, ans=0.125 2024-08-19 17:34:33,084 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 17:34:35,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4463490.0, ans=0.125 2024-08-19 17:35:10,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4463690.0, ans=0.035 2024-08-19 17:35:25,028 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:35:28,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4463790.0, ans=0.0 2024-08-19 17:35:31,774 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1800, loss[loss=0.1077, beats_loss=0.01089, ecapa_loss=0.0001434, whisper_loss=0.0954, over 16923.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.08871, over 3745889.37 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:35:49,718 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 17:35:51,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=4463990.0, ans=0.02 2024-08-19 17:35:54,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4463990.0, ans=0.125 2024-08-19 17:35:57,996 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 17:36:06,875 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 17:36:07,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4464090.0, ans=0.1 2024-08-19 17:36:23,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4464190.0, ans=0.125 2024-08-19 17:36:24,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4464190.0, ans=0.125 2024-08-19 17:36:35,876 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 17:36:36,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4464290.0, ans=0.125 2024-08-19 17:36:40,214 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.242e+01 2.530e+01 2.809e+01 4.955e+01, threshold=5.060e+01, percent-clipped=1.0 2024-08-19 17:36:53,835 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1850, loss[loss=0.1089, beats_loss=0.009738, ecapa_loss=0.0001365, whisper_loss=0.0978, over 23528.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.08874, over 3770418.44 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:37:08,805 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 17:37:08,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4464490.0, ans=0.0 2024-08-19 17:37:31,959 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 17:37:39,263 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 14 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 17:37:40,705 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-19 17:37:42,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4464690.0, ans=0.0 2024-08-19 17:37:48,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=12.0 2024-08-19 17:38:06,225 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 36 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 17:38:17,775 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1900, loss[loss=0.1181, beats_loss=0.01028, ecapa_loss=0.0001092, whisper_loss=0.1067, over 23678.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01033, ecapa_loss=0.0001375, whisper_loss=0.08838, over 3755355.33 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:38:25,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4464890.0, ans=0.125 2024-08-19 17:38:36,645 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-19 17:39:13,841 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 17:39:16,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4465190.0, ans=0.1 2024-08-19 17:39:17,587 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 17:39:28,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.273e+01 2.509e+01 2.742e+01 5.984e+01, threshold=5.017e+01, percent-clipped=1.0 2024-08-19 17:39:41,914 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 1950, loss[loss=0.09451, beats_loss=0.01214, ecapa_loss=0.0001334, whisper_loss=0.08104, over 15599.00 frames. ], tot_loss[loss=0.09973, beats_loss=0.01041, ecapa_loss=0.0001363, whisper_loss=0.08796, over 3758303.41 frames. ], batch size: 61, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:40:19,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4465590.0, ans=0.07 2024-08-19 17:40:20,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-19 17:40:34,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4465690.0, ans=0.0 2024-08-19 17:40:56,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4465790.0, ans=10.0 2024-08-19 17:40:58,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4465790.0, ans=0.04949747468305833 2024-08-19 17:41:03,024 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.634e-01 2024-08-19 17:41:07,639 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2000, loss[loss=0.09616, beats_loss=0.01229, ecapa_loss=0.000109, whisper_loss=0.08278, over 15296.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001362, whisper_loss=0.08841, over 3763605.88 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:41:19,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4465890.0, ans=0.0 2024-08-19 17:41:22,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4465990.0, ans=0.125 2024-08-19 17:41:43,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4466090.0, ans=0.95 2024-08-19 17:41:43,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-19 17:41:51,889 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 17:41:55,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4466090.0, ans=0.0 2024-08-19 17:42:08,772 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 17:42:16,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2024-08-19 17:42:18,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.386e+01 2.592e+01 2.882e+01 2.246e+02, threshold=5.185e+01, percent-clipped=4.0 2024-08-19 17:42:19,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4466290.0, ans=0.1 2024-08-19 17:42:21,010 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 17:42:30,652 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 23 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-19 17:42:31,612 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2050, loss[loss=0.0848, beats_loss=0.01142, ecapa_loss=0.0001276, whisper_loss=0.07211, over 22647.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01037, ecapa_loss=0.0001352, whisper_loss=0.0885, over 3765175.11 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:42:32,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4466390.0, ans=0.125 2024-08-19 17:42:47,110 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 17:42:53,819 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 17:42:57,541 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 17:43:25,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4466690.0, ans=0.125 2024-08-19 17:43:33,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4466690.0, ans=0.0 2024-08-19 17:43:39,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4466690.0, ans=0.1 2024-08-19 17:43:42,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4466790.0, ans=0.125 2024-08-19 17:43:52,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4466790.0, ans=0.2 2024-08-19 17:43:58,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4466890.0, ans=0.125 2024-08-19 17:43:58,915 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2100, loss[loss=0.09335, beats_loss=0.01199, ecapa_loss=0.0001276, whisper_loss=0.08009, over 21752.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0104, ecapa_loss=0.0001355, whisper_loss=0.08838, over 3773735.50 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:44:18,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4466990.0, ans=0.5 2024-08-19 17:44:27,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4466990.0, ans=0.0 2024-08-19 17:44:27,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-19 17:44:30,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4466990.0, ans=0.1 2024-08-19 17:44:44,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2024-08-19 17:45:06,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4467290.0, ans=0.125 2024-08-19 17:45:10,670 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.301e+01 2.618e+01 2.880e+01 6.452e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 17:45:24,068 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2150, loss[loss=0.1177, beats_loss=0.01075, ecapa_loss=0.0001081, whisper_loss=0.1059, over 16004.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01046, ecapa_loss=0.000134, whisper_loss=0.08838, over 3766663.74 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:45:24,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4467390.0, ans=0.125 2024-08-19 17:45:31,543 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 17:45:35,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4467390.0, ans=0.125 2024-08-19 17:45:53,272 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.035e+02 2024-08-19 17:45:55,965 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 17:46:02,759 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 17:46:04,477 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 17:46:07,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4467590.0, ans=0.125 2024-08-19 17:46:07,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4467590.0, ans=0.2 2024-08-19 17:46:09,877 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 17:46:15,085 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:46:15,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4467690.0, ans=0.125 2024-08-19 17:46:22,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4467690.0, ans=0.125 2024-08-19 17:46:51,889 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2200, loss[loss=0.1183, beats_loss=0.009671, ecapa_loss=0.0001334, whisper_loss=0.1073, over 23598.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01047, ecapa_loss=0.0001344, whisper_loss=0.08843, over 3776336.07 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:46:52,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4467890.0, ans=0.0 2024-08-19 17:47:02,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4467890.0, ans=0.0 2024-08-19 17:47:06,455 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 17:47:06,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4467890.0, ans=0.2 2024-08-19 17:47:15,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4467990.0, ans=0.0 2024-08-19 17:47:20,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4467990.0, ans=0.04949747468305833 2024-08-19 17:47:23,564 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 17:47:37,166 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 17:47:44,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4468190.0, ans=0.1 2024-08-19 17:47:52,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4468190.0, ans=0.125 2024-08-19 17:48:04,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.319e+01 2.611e+01 2.846e+01 3.358e+02, threshold=5.223e+01, percent-clipped=1.0 2024-08-19 17:48:09,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4468290.0, ans=0.125 2024-08-19 17:48:16,134 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 17:48:17,617 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2250, loss[loss=0.1066, beats_loss=0.01001, ecapa_loss=0.0001558, whisper_loss=0.095, over 16811.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001363, whisper_loss=0.0892, over 3763054.35 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:48:21,588 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 17:48:26,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4468390.0, ans=0.1 2024-08-19 17:48:42,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4468490.0, ans=0.0 2024-08-19 17:48:44,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-19 17:49:21,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4468690.0, ans=0.2 2024-08-19 17:49:23,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4468690.0, ans=0.125 2024-08-19 17:49:38,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4468790.0, ans=0.0 2024-08-19 17:49:42,745 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2300, loss[loss=0.1038, beats_loss=0.009641, ecapa_loss=0.0001623, whisper_loss=0.09249, over 16846.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.09002, over 3778721.62 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:49:43,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4468890.0, ans=0.0 2024-08-19 17:49:48,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4468890.0, ans=0.125 2024-08-19 17:49:51,874 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:50:00,481 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 17:50:05,177 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 17:50:12,130 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 16 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 17:50:19,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4469090.0, ans=0.0 2024-08-19 17:50:26,112 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 34 from Vox, 25 fro AS 2024-08-19 17:50:35,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4469190.0, ans=0.125 2024-08-19 17:50:48,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-19 17:50:54,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.331e+01 2.598e+01 2.979e+01 4.563e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-19 17:51:07,667 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2350, loss[loss=0.1006, beats_loss=0.009666, ecapa_loss=0.000176, whisper_loss=0.08915, over 18593.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001389, whisper_loss=0.08953, over 3768694.45 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:51:09,670 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 17:51:23,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4469490.0, ans=0.125 2024-08-19 17:51:49,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4469590.0, ans=0.1 2024-08-19 17:51:53,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4469590.0, ans=0.125 2024-08-19 17:51:58,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4469690.0, ans=0.0 2024-08-19 17:52:08,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4469690.0, ans=0.0 2024-08-19 17:52:23,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-19 17:52:26,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4469790.0, ans=0.125 2024-08-19 17:52:33,338 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2400, loss[loss=0.1125, beats_loss=0.01098, ecapa_loss=0.0001419, whisper_loss=0.1001, over 15906.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001391, whisper_loss=0.0889, over 3787436.30 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:52:39,984 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4469890.0, ans=0.125 2024-08-19 17:52:46,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4469890.0, ans=0.125 2024-08-19 17:53:10,089 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2024-08-19 17:53:18,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4470090.0, ans=0.125 2024-08-19 17:53:24,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4470190.0, ans=0.125 2024-08-19 17:53:29,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4470190.0, ans=0.125 2024-08-19 17:53:38,833 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 17:53:46,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.352e+01 2.498e+01 2.702e+01 6.582e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-19 17:53:50,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4470290.0, ans=0.125 2024-08-19 17:53:55,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4470290.0, ans=0.125 2024-08-19 17:54:01,060 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2450, loss[loss=0.08188, beats_loss=0.01012, ecapa_loss=0.0001534, whisper_loss=0.07022, over 14801.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001395, whisper_loss=0.08942, over 3768048.25 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:54:07,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4470390.0, ans=0.125 2024-08-19 17:54:09,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-08-19 17:54:11,767 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.11 vs. limit=10.0 2024-08-19 17:54:36,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4470590.0, ans=0.0 2024-08-19 17:54:44,640 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 17:54:57,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4470690.0, ans=0.125 2024-08-19 17:55:06,445 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 17:55:12,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-19 17:55:28,490 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2500, loss[loss=0.1099, beats_loss=0.009571, ecapa_loss=0.0001115, whisper_loss=0.09922, over 13776.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.08833, over 3740883.52 frames. ], batch size: 49, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:55:28,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4470890.0, ans=0.09899494936611666 2024-08-19 17:55:28,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4470890.0, ans=0.0 2024-08-19 17:55:30,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4470890.0, ans=0.1 2024-08-19 17:55:32,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4470890.0, ans=0.07 2024-08-19 17:55:36,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.81 vs. limit=22.5 2024-08-19 17:55:41,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4470890.0, ans=0.125 2024-08-19 17:55:56,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4470990.0, ans=0.125 2024-08-19 17:56:04,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-19 17:56:17,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4471090.0, ans=0.1 2024-08-19 17:56:17,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4471090.0, ans=0.125 2024-08-19 17:56:21,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4471190.0, ans=0.125 2024-08-19 17:56:28,427 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.045e-01 2024-08-19 17:56:33,620 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 17:56:33,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4471190.0, ans=0.0 2024-08-19 17:56:40,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.297e+01 2.536e+01 2.855e+01 4.497e+01, threshold=5.072e+01, percent-clipped=1.0 2024-08-19 17:56:54,079 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2550, loss[loss=0.101, beats_loss=0.009937, ecapa_loss=0.0001235, whisper_loss=0.08988, over 17449.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001381, whisper_loss=0.08903, over 3753038.01 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:57:03,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4471390.0, ans=0.0 2024-08-19 17:57:12,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4471490.0, ans=0.125 2024-08-19 17:57:26,190 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-19 17:57:41,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4471590.0, ans=0.0 2024-08-19 17:57:55,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4471690.0, ans=0.2 2024-08-19 17:58:06,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4471790.0, ans=0.1 2024-08-19 17:58:08,227 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 17:58:17,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4471790.0, ans=0.125 2024-08-19 17:58:18,179 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 17:58:19,397 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2600, loss[loss=0.1026, beats_loss=0.008823, ecapa_loss=0.0001353, whisper_loss=0.09245, over 16356.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.0888, over 3771856.25 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:58:25,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4471890.0, ans=0.0 2024-08-19 17:58:27,088 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 17:58:30,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4471890.0, ans=0.05 2024-08-19 17:58:46,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4471990.0, ans=0.0 2024-08-19 17:58:49,862 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 17:59:00,168 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 17:59:07,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4472090.0, ans=0.125 2024-08-19 17:59:07,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4472090.0, ans=0.0 2024-08-19 17:59:13,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4472190.0, ans=0.125 2024-08-19 17:59:16,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4472190.0, ans=0.125 2024-08-19 17:59:21,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4472190.0, ans=0.0 2024-08-19 17:59:33,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.323e+01 2.520e+01 2.771e+01 4.731e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 17:59:34,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4472290.0, ans=0.07 2024-08-19 17:59:47,284 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2650, loss[loss=0.08722, beats_loss=0.00854, ecapa_loss=0.0001889, whisper_loss=0.07679, over 17264.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001395, whisper_loss=0.08956, over 3805781.13 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:00:30,698 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 18:00:37,658 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 18:00:45,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4472690.0, ans=0.125 2024-08-19 18:00:45,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4472690.0, ans=0.2 2024-08-19 18:00:55,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4472690.0, ans=0.0 2024-08-19 18:01:01,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4472790.0, ans=0.125 2024-08-19 18:01:15,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4472790.0, ans=0.125 2024-08-19 18:01:18,033 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2700, loss[loss=0.09444, beats_loss=0.01167, ecapa_loss=0.0001278, whisper_loss=0.08149, over 20022.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.08986, over 3810593.91 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:01:52,404 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 18:02:31,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.406e+01 2.691e+01 2.981e+01 2.904e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-19 18:02:45,275 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2750, loss[loss=0.111, beats_loss=0.008262, ecapa_loss=0.0001655, whisper_loss=0.101, over 15839.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.08951, over 3795147.49 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:02:57,493 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 18:02:59,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2024-08-19 18:03:08,525 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 18:03:11,952 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 18:03:34,837 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4473590.0, ans=0.125 2024-08-19 18:03:37,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-08-19 18:03:39,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4473690.0, ans=0.125 2024-08-19 18:03:51,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4473690.0, ans=0.125 2024-08-19 18:03:58,629 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 18:04:03,517 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 18:04:05,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4473790.0, ans=0.125 2024-08-19 18:04:12,158 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 29 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-19 18:04:13,813 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2800, loss[loss=0.107, beats_loss=0.0101, ecapa_loss=0.000192, whisper_loss=0.09497, over 19882.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.08949, over 3769767.12 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:04:31,225 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-19 18:04:49,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4474090.0, ans=0.5 2024-08-19 18:05:01,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4474090.0, ans=0.0 2024-08-19 18:05:07,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4474190.0, ans=0.125 2024-08-19 18:05:08,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4474190.0, ans=0.125 2024-08-19 18:05:28,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.228e+01 2.485e+01 2.852e+01 2.973e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-19 18:05:43,145 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2850, loss[loss=0.112, beats_loss=0.01053, ecapa_loss=0.0001396, whisper_loss=0.1, over 20618.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.08988, over 3754097.59 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:05:54,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4474390.0, ans=0.125 2024-08-19 18:06:03,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-19 18:06:09,845 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 18:06:26,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2024-08-19 18:06:35,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=12.0 2024-08-19 18:06:38,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4474690.0, ans=0.1 2024-08-19 18:06:39,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=12.0 2024-08-19 18:07:03,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4474790.0, ans=0.2 2024-08-19 18:07:11,017 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2900, loss[loss=0.09454, beats_loss=0.01228, ecapa_loss=0.000125, whisper_loss=0.08101, over 18567.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001395, whisper_loss=0.09028, over 3734491.74 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:07:16,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4474890.0, ans=0.125 2024-08-19 18:07:45,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4475090.0, ans=0.125 2024-08-19 18:07:55,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4475090.0, ans=0.0 2024-08-19 18:08:06,268 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 18:08:07,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4475190.0, ans=0.125 2024-08-19 18:08:18,145 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.586e+00 2024-08-19 18:08:27,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.549e+01 2.227e+01 2.443e+01 2.748e+01 5.602e+01, threshold=4.887e+01, percent-clipped=1.0 2024-08-19 18:08:37,157 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 13 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 18:08:38,572 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 18:08:41,771 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 2950, loss[loss=0.105, beats_loss=0.01015, ecapa_loss=0.0001481, whisper_loss=0.09342, over 22071.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08973, over 3722500.15 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:09:09,675 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 18:09:23,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4475590.0, ans=0.125 2024-08-19 18:09:42,651 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 24 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-19 18:09:43,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4475690.0, ans=0.125 2024-08-19 18:09:50,969 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 18:10:12,307 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3000, loss[loss=0.1075, beats_loss=0.01093, ecapa_loss=0.0001289, whisper_loss=0.09532, over 22333.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001413, whisper_loss=0.09014, over 3729166.32 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:10:12,307 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-19 18:10:48,109 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on ASR_libri: loss=0.2543, beats_loss=0, ecapa_loss=0.0005052, whisper_loss=0.2492, over 931116.00 frames. 2024-08-19 18:10:54,559 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.6872, 1.1512, 1.6034, 1.4880, 1.7730, 1.3898, 1.4709, 1.5760], device='cuda:1') 2024-08-19 18:11:09,344 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on SV_voxceleb1: loss=0.003946, beats_loss=0, ecapa_loss=0.0003946, whisper_loss=0, over 944235.00 frames. 2024-08-19 18:12:49,004 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on AT_audioset: loss=0.02308, beats_loss=0.02308, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 18:12:49,008 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-19 18:12:49,245 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 18:13:36,620 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.421e+05 2024-08-19 18:13:36,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4476090.0, ans=0.0 2024-08-19 18:13:41,685 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 18:14:01,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4476290.0, ans=0.125 2024-08-19 18:14:02,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.325e+01 2.611e+01 2.866e+01 5.886e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-19 18:14:10,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4476290.0, ans=0.1 2024-08-19 18:14:16,849 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3050, loss[loss=0.1291, beats_loss=0.011, ecapa_loss=0.0001667, whisper_loss=0.1164, over 22604.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.09081, over 3769495.54 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:14:17,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4476390.0, ans=0.125 2024-08-19 18:14:23,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4476390.0, ans=0.125 2024-08-19 18:15:15,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4476690.0, ans=0.125 2024-08-19 18:15:15,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4476690.0, ans=0.125 2024-08-19 18:15:42,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4476790.0, ans=0.0 2024-08-19 18:15:50,661 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3100, loss[loss=0.09465, beats_loss=0.01138, ecapa_loss=0.0001348, whisper_loss=0.08192, over 22063.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.000141, whisper_loss=0.09109, over 3808374.75 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:15:51,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4476890.0, ans=0.125 2024-08-19 18:16:06,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4476890.0, ans=0.0 2024-08-19 18:16:20,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4476990.0, ans=0.1 2024-08-19 18:16:53,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4477190.0, ans=0.0 2024-08-19 18:17:03,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4477290.0, ans=0.0 2024-08-19 18:17:07,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.384e+01 2.601e+01 2.875e+01 4.295e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 18:17:22,601 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3150, loss[loss=0.1171, beats_loss=0.00989, ecapa_loss=0.0001763, whisper_loss=0.1055, over 16939.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001417, whisper_loss=0.09137, over 3818047.38 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:17:32,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4477390.0, ans=0.1 2024-08-19 18:17:41,537 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 21 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-19 18:17:45,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4477490.0, ans=0.0 2024-08-19 18:17:52,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4477490.0, ans=0.125 2024-08-19 18:17:54,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4477490.0, ans=0.0 2024-08-19 18:18:08,664 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 18:18:09,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-19 18:18:31,468 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 18:18:40,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-19 18:18:43,940 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 18:18:52,529 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3200, loss[loss=0.0899, beats_loss=0.01318, ecapa_loss=0.000142, whisper_loss=0.0753, over 17520.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001419, whisper_loss=0.09096, over 3817198.13 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:19:01,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4477890.0, ans=0.0 2024-08-19 18:19:33,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4478090.0, ans=0.0 2024-08-19 18:19:52,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4478190.0, ans=0.1 2024-08-19 18:19:55,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4478190.0, ans=0.1 2024-08-19 18:19:57,592 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.34 vs. limit=10.0 2024-08-19 18:19:59,934 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 18:20:05,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4478290.0, ans=0.2 2024-08-19 18:20:05,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2024-08-19 18:20:07,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.317e+01 2.495e+01 2.835e+01 3.728e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-19 18:20:12,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-19 18:20:19,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4478290.0, ans=0.5 2024-08-19 18:20:20,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4478390.0, ans=0.125 2024-08-19 18:20:21,887 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3250, loss[loss=0.06922, beats_loss=0.01427, ecapa_loss=0.000103, whisper_loss=0.05392, over 18795.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001423, whisper_loss=0.09098, over 3819420.04 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:20:25,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4478390.0, ans=0.035 2024-08-19 18:20:44,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4478490.0, ans=0.0 2024-08-19 18:20:48,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4478490.0, ans=0.0 2024-08-19 18:20:55,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4478490.0, ans=0.1 2024-08-19 18:20:59,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4478590.0, ans=0.0 2024-08-19 18:21:01,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4478590.0, ans=0.07 2024-08-19 18:21:20,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4478690.0, ans=0.02 2024-08-19 18:21:21,207 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 25 from LS+wenet, 7 from Vox, 24 fro AS 2024-08-19 18:21:24,629 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 21 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 18:21:34,347 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-19 18:21:54,148 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3300, loss[loss=0.1147, beats_loss=0.009728, ecapa_loss=0.0001598, whisper_loss=0.1034, over 22201.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01037, ecapa_loss=0.000142, whisper_loss=0.09131, over 3793394.86 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:21:56,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4478890.0, ans=0.125 2024-08-19 18:22:08,666 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 18:22:14,952 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 18 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 18:22:16,451 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-19 18:22:17,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4478990.0, ans=0.125 2024-08-19 18:22:19,130 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.455e+01 2024-08-19 18:22:20,209 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 37 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 18:22:38,360 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 18:23:07,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.244e+01 2.479e+01 2.933e+01 3.930e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 18:23:21,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4479390.0, ans=0.125 2024-08-19 18:23:23,204 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3350, loss[loss=0.09964, beats_loss=0.01001, ecapa_loss=0.0001769, whisper_loss=0.08787, over 21940.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001423, whisper_loss=0.09162, over 3757071.05 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:23:32,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4479390.0, ans=0.125 2024-08-19 18:23:41,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4479490.0, ans=0.125 2024-08-19 18:24:04,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4479590.0, ans=0.1 2024-08-19 18:24:06,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4479590.0, ans=0.125 2024-08-19 18:24:11,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4479590.0, ans=0.125 2024-08-19 18:24:21,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4479690.0, ans=0.0 2024-08-19 18:24:49,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4479790.0, ans=0.2 2024-08-19 18:24:52,562 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3400, loss[loss=0.09507, beats_loss=0.01152, ecapa_loss=0.000154, whisper_loss=0.082, over 19740.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0103, ecapa_loss=0.0001429, whisper_loss=0.09164, over 3793753.99 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:24:58,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4479890.0, ans=0.0 2024-08-19 18:25:13,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4479990.0, ans=0.125 2024-08-19 18:25:37,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4480090.0, ans=0.125 2024-08-19 18:25:37,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4480090.0, ans=0.1 2024-08-19 18:25:39,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4480090.0, ans=0.0 2024-08-19 18:25:47,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4480190.0, ans=0.125 2024-08-19 18:26:09,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.304e+01 2.534e+01 2.812e+01 7.025e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-19 18:26:14,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4480290.0, ans=0.05 2024-08-19 18:26:17,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4480290.0, ans=0.1 2024-08-19 18:26:21,309 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-19 18:26:24,786 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3450, loss[loss=0.08971, beats_loss=0.01037, ecapa_loss=0.0001434, whisper_loss=0.0779, over 15391.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01027, ecapa_loss=0.0001433, whisper_loss=0.09154, over 3782488.70 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:26:42,108 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 18:26:47,600 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=12.0 2024-08-19 18:26:51,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2024-08-19 18:27:20,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4480690.0, ans=0.125 2024-08-19 18:27:35,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4480790.0, ans=0.0 2024-08-19 18:27:47,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4480790.0, ans=0.125 2024-08-19 18:27:50,097 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3500, loss[loss=0.1143, beats_loss=0.01146, ecapa_loss=0.0001186, whisper_loss=0.1016, over 15902.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01022, ecapa_loss=0.0001441, whisper_loss=0.09145, over 3771252.93 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:28:22,479 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 26 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-19 18:28:29,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4481090.0, ans=0.125 2024-08-19 18:28:33,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4481090.0, ans=0.0 2024-08-19 18:28:40,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4481190.0, ans=0.05 2024-08-19 18:28:43,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4481190.0, ans=0.0 2024-08-19 18:28:54,911 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 18:29:01,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.232e+01 2.458e+01 2.911e+01 6.376e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-19 18:29:11,331 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:29:12,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4481390.0, ans=0.0 2024-08-19 18:29:14,273 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3550, loss[loss=0.09885, beats_loss=0.01139, ecapa_loss=0.0001305, whisper_loss=0.08615, over 22297.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.000144, whisper_loss=0.09093, over 3766341.46 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:29:26,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4481390.0, ans=0.125 2024-08-19 18:29:29,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481490.0, ans=0.1 2024-08-19 18:29:50,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4481590.0, ans=0.125 2024-08-19 18:29:50,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4481590.0, ans=0.2 2024-08-19 18:29:55,108 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 11 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 18:30:03,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4481690.0, ans=0.125 2024-08-19 18:30:14,736 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 18:30:32,378 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-19 18:30:34,777 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3600, loss[loss=0.0873, beats_loss=0.01251, ecapa_loss=0.0001106, whisper_loss=0.07368, over 23216.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001444, whisper_loss=0.09072, over 3795919.00 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:30:37,634 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.01 vs. limit=6.0 2024-08-19 18:30:39,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4481890.0, ans=0.0 2024-08-19 18:30:41,726 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 18:30:43,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4481890.0, ans=0.05 2024-08-19 18:30:43,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-08-19 18:30:47,207 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-19 18:30:55,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4481990.0, ans=0.125 2024-08-19 18:31:03,690 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 18:31:10,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4482090.0, ans=0.125 2024-08-19 18:31:27,551 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 18:31:34,963 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-19 18:31:42,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.192e+01 2.433e+01 2.584e+01 3.997e+01, threshold=4.865e+01, percent-clipped=0.0 2024-08-19 18:31:54,779 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3650, loss[loss=0.1084, beats_loss=0.007797, ecapa_loss=0.0001935, whisper_loss=0.09868, over 18226.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001438, whisper_loss=0.09014, over 3803993.09 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:32:01,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4482390.0, ans=0.125 2024-08-19 18:32:48,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-19 18:33:14,972 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3700, loss[loss=0.09284, beats_loss=0.01306, ecapa_loss=0.0001067, whisper_loss=0.07871, over 19721.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001431, whisper_loss=0.08987, over 3783681.69 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:33:16,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4482890.0, ans=0.1 2024-08-19 18:33:31,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4482990.0, ans=0.0 2024-08-19 18:33:48,883 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 18:34:01,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 18:34:08,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 18:34:14,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4483190.0, ans=0.125 2024-08-19 18:34:15,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4483190.0, ans=0.0 2024-08-19 18:34:28,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4483290.0, ans=0.125 2024-08-19 18:34:30,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.276e+01 2.511e+01 2.757e+01 7.975e+01, threshold=5.022e+01, percent-clipped=3.0 2024-08-19 18:34:31,553 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 11 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 18:34:41,472 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 18:34:42,962 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3750, loss[loss=0.09323, beats_loss=0.01097, ecapa_loss=0.0001792, whisper_loss=0.08047, over 20517.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001421, whisper_loss=0.09015, over 3786429.67 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:35:01,197 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 21 from LS+wenet, 14 from Vox, 15 fro AS 2024-08-19 18:35:02,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:04,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:06,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4483490.0, ans=0.2 2024-08-19 18:35:12,184 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 18:35:20,394 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 18:35:32,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4483690.0, ans=0.125 2024-08-19 18:35:35,076 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 23 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 18:35:36,789 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 18:36:02,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4483890.0, ans=0.125 2024-08-19 18:36:03,416 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3800, loss[loss=0.07437, beats_loss=0.01075, ecapa_loss=0.0001556, whisper_loss=0.06206, over 18614.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.09031, over 3784450.36 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:36:11,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4483890.0, ans=0.125 2024-08-19 18:36:18,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4483990.0, ans=0.125 2024-08-19 18:36:23,606 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 30 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-19 18:36:28,383 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 16 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 18:36:30,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4483990.0, ans=0.1 2024-08-19 18:36:42,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2024-08-19 18:36:48,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4484090.0, ans=0.1 2024-08-19 18:36:48,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4484090.0, ans=0.2 2024-08-19 18:37:05,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4484290.0, ans=0.125 2024-08-19 18:37:09,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.312e+01 2.559e+01 2.923e+01 4.060e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:37:13,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4484290.0, ans=0.125 2024-08-19 18:37:13,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.28 vs. limit=10.0 2024-08-19 18:37:22,521 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3850, loss[loss=0.07703, beats_loss=0.01198, ecapa_loss=0.0001416, whisper_loss=0.06363, over 20355.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001427, whisper_loss=0.09001, over 3787831.81 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:37:27,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4484390.0, ans=0.125 2024-08-19 18:37:50,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4484490.0, ans=0.5 2024-08-19 18:37:50,573 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.24 vs. limit=10.0 2024-08-19 18:37:59,250 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 18:38:00,692 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 15 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 18:38:01,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4484590.0, ans=0.125 2024-08-19 18:38:02,383 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 13 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 18:38:10,184 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 18:38:21,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4484690.0, ans=0.125 2024-08-19 18:38:32,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4484790.0, ans=0.1 2024-08-19 18:38:40,573 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3900, loss[loss=0.09098, beats_loss=0.0108, ecapa_loss=0.000154, whisper_loss=0.07864, over 16849.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01037, ecapa_loss=0.0001432, whisper_loss=0.08901, over 3751844.09 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:38:40,924 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 18:38:44,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4484890.0, ans=0.125 2024-08-19 18:39:06,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4484990.0, ans=0.0 2024-08-19 18:39:08,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4484990.0, ans=0.125 2024-08-19 18:39:28,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4485190.0, ans=0.0 2024-08-19 18:39:47,820 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.325e+01 2.529e+01 2.804e+01 3.948e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 18:39:59,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4485390.0, ans=0.125 2024-08-19 18:40:00,819 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 3950, loss[loss=0.1042, beats_loss=0.008266, ecapa_loss=0.0002102, whisper_loss=0.09384, over 16893.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01029, ecapa_loss=0.0001435, whisper_loss=0.09085, over 3807984.33 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:40:01,599 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2024-08-19 18:40:34,187 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:40:38,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4485590.0, ans=0.0 2024-08-19 18:40:45,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4485590.0, ans=0.125 2024-08-19 18:41:22,593 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4000, loss[loss=0.08681, beats_loss=0.01248, ecapa_loss=0.0001319, whisper_loss=0.07301, over 22755.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001435, whisper_loss=0.09069, over 3858645.66 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:41:58,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4486090.0, ans=0.5 2024-08-19 18:42:15,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-08-19 18:42:29,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.300e+01 2.585e+01 3.012e+01 4.802e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-19 18:42:42,330 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4050, loss[loss=0.1196, beats_loss=0.009617, ecapa_loss=0.000131, whisper_loss=0.1087, over 17472.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001431, whisper_loss=0.09139, over 3892789.82 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:42:52,017 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 18:43:17,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4486590.0, ans=0.125 2024-08-19 18:43:20,362 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 18:43:41,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4486690.0, ans=0.125 2024-08-19 18:43:43,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4486690.0, ans=0.04949747468305833 2024-08-19 18:43:46,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4486790.0, ans=0.05 2024-08-19 18:44:01,532 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4100, loss[loss=0.1064, beats_loss=0.01256, ecapa_loss=0.0001166, whisper_loss=0.09265, over 21996.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001434, whisper_loss=0.09086, over 3876972.06 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:44:12,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-19 18:44:21,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4486990.0, ans=0.125 2024-08-19 18:44:21,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4486990.0, ans=0.125 2024-08-19 18:45:07,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.401e+01 2.726e+01 3.123e+01 1.504e+02, threshold=5.451e+01, percent-clipped=2.0 2024-08-19 18:45:20,294 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4150, loss[loss=0.08614, beats_loss=0.01349, ecapa_loss=0.0001249, whisper_loss=0.0714, over 21933.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001429, whisper_loss=0.0909, over 3851007.74 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:45:20,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4487390.0, ans=0.125 2024-08-19 18:45:28,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4487390.0, ans=0.125 2024-08-19 18:45:32,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4487390.0, ans=0.0 2024-08-19 18:45:38,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4487490.0, ans=0.125 2024-08-19 18:45:44,418 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 26 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-19 18:45:46,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4487490.0, ans=0.0 2024-08-19 18:45:46,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-08-19 18:45:48,968 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 18:45:51,970 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 18:45:55,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4487590.0, ans=0.2 2024-08-19 18:45:57,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4487590.0, ans=0.125 2024-08-19 18:45:58,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4487590.0, ans=0.125 2024-08-19 18:46:07,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4487690.0, ans=0.1 2024-08-19 18:46:08,164 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 18:46:23,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4487790.0, ans=0.0 2024-08-19 18:46:40,452 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4200, loss[loss=0.09957, beats_loss=0.01171, ecapa_loss=0.0001315, whisper_loss=0.08654, over 22185.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001427, whisper_loss=0.09069, over 3828096.86 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:46:46,074 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 13 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 18:46:50,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4487890.0, ans=0.1 2024-08-19 18:46:57,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4487990.0, ans=0.1 2024-08-19 18:46:59,239 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 18:47:22,885 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 18:47:34,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4488190.0, ans=0.125 2024-08-19 18:47:49,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.234e+01 2.488e+01 2.803e+01 1.323e+02, threshold=4.977e+01, percent-clipped=2.0 2024-08-19 18:48:02,443 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4250, loss[loss=0.0998, beats_loss=0.01189, ecapa_loss=0.0001398, whisper_loss=0.08651, over 20054.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001424, whisper_loss=0.09079, over 3790153.26 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:48:07,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4488390.0, ans=0.1 2024-08-19 18:48:14,052 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 18:48:18,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2024-08-19 18:48:27,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.54 vs. limit=10.0 2024-08-19 18:49:05,308 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 18:49:22,716 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4300, loss[loss=0.1061, beats_loss=0.01028, ecapa_loss=0.0001564, whisper_loss=0.09423, over 15626.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001417, whisper_loss=0.09041, over 3797793.01 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:49:31,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4488890.0, ans=0.125 2024-08-19 18:49:39,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4488990.0, ans=0.0 2024-08-19 18:49:43,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-19 18:49:58,352 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 18:49:59,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4489090.0, ans=0.125 2024-08-19 18:50:08,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4489090.0, ans=0.0 2024-08-19 18:50:13,510 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 18:50:15,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4489190.0, ans=0.0 2024-08-19 18:50:30,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.303e+01 2.487e+01 2.877e+01 4.114e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-19 18:50:43,617 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4350, loss[loss=0.09941, beats_loss=0.01146, ecapa_loss=0.0001146, whisper_loss=0.08681, over 21630.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001398, whisper_loss=0.09028, over 3836307.15 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:50:48,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-08-19 18:50:52,530 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 17 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 18:50:52,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4489390.0, ans=0.04949747468305833 2024-08-19 18:51:19,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4489590.0, ans=0.0 2024-08-19 18:51:26,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2024-08-19 18:51:31,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4489690.0, ans=0.1 2024-08-19 18:51:34,043 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 18:51:37,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-19 18:51:50,355 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-08-19 18:51:51,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4489790.0, ans=0.125 2024-08-19 18:52:03,947 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4400, loss[loss=0.0815, beats_loss=0.01114, ecapa_loss=0.0001146, whisper_loss=0.06922, over 16307.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.09071, over 3833687.03 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:52:08,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-19 18:52:20,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4489990.0, ans=0.0 2024-08-19 18:52:31,953 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-19 18:52:45,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4490090.0, ans=0.1 2024-08-19 18:53:00,476 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 37 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 18:53:11,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.271e+01 2.455e+01 2.760e+01 4.090e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 18:53:12,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4490290.0, ans=0.125 2024-08-19 18:53:20,994 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 18:53:22,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4490390.0, ans=0.05 2024-08-19 18:53:23,762 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4450, loss[loss=0.1058, beats_loss=0.008582, ecapa_loss=0.0001999, whisper_loss=0.09525, over 21734.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.09021, over 3826161.72 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:53:35,396 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 18:53:48,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4490490.0, ans=0.125 2024-08-19 18:53:51,036 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 18:54:17,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4490690.0, ans=0.0 2024-08-19 18:54:24,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=15.0 2024-08-19 18:54:35,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-19 18:54:44,726 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4500, loss[loss=0.09455, beats_loss=0.008805, ecapa_loss=0.0001583, whisper_loss=0.08416, over 14488.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001421, whisper_loss=0.08986, over 3817153.13 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:54:58,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4490890.0, ans=0.0 2024-08-19 18:55:14,033 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 18:55:34,807 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 18:55:54,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.263e+01 2.559e+01 2.809e+01 3.466e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:56:07,818 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4550, loss[loss=0.09614, beats_loss=0.01064, ecapa_loss=0.0001642, whisper_loss=0.08386, over 21672.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.000142, whisper_loss=0.08997, over 3812244.61 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:56:17,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-19 18:56:22,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4491390.0, ans=0.125 2024-08-19 18:56:37,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4491490.0, ans=0.125 2024-08-19 18:56:42,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4491590.0, ans=0.2 2024-08-19 18:57:00,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4491690.0, ans=0.125 2024-08-19 18:57:16,170 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 20 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-19 18:57:17,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4491790.0, ans=0.07 2024-08-19 18:57:26,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4491790.0, ans=0.1 2024-08-19 18:57:32,997 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4600, loss[loss=0.08481, beats_loss=0.01241, ecapa_loss=0.0001167, whisper_loss=0.07123, over 16666.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001418, whisper_loss=0.09024, over 3801499.66 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:57:49,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4491990.0, ans=0.0 2024-08-19 18:57:56,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4491990.0, ans=0.0 2024-08-19 18:58:16,165 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 26 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 18:58:24,556 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 18:58:28,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4492190.0, ans=0.125 2024-08-19 18:58:41,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4492290.0, ans=0.0 2024-08-19 18:58:46,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.297e+01 2.492e+01 2.828e+01 4.082e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-19 18:58:56,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4492390.0, ans=0.95 2024-08-19 18:58:56,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4492390.0, ans=0.0 2024-08-19 18:58:57,897 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4650, loss[loss=0.06954, beats_loss=0.01397, ecapa_loss=0.0001303, whisper_loss=0.05426, over 16193.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001414, whisper_loss=0.09091, over 3787621.66 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:59:08,875 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-19 18:59:14,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4492490.0, ans=0.125 2024-08-19 18:59:58,963 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 19:00:18,202 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 19:00:18,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4492790.0, ans=0.0 2024-08-19 19:00:22,848 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4700, loss[loss=0.0952, beats_loss=0.01255, ecapa_loss=0.0001294, whisper_loss=0.08136, over 21439.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.09076, over 3810707.60 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:00:48,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4492990.0, ans=0.125 2024-08-19 19:00:53,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4492990.0, ans=0.0 2024-08-19 19:00:58,150 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 19:01:00,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4493090.0, ans=0.0 2024-08-19 19:01:23,420 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 11 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-19 19:01:25,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4493190.0, ans=0.125 2024-08-19 19:01:28,534 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 19:01:34,506 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.345e+01 2.552e+01 2.786e+01 4.462e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 19:01:36,560 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 19:01:43,341 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-19 19:01:45,838 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4750, loss[loss=0.09993, beats_loss=0.01166, ecapa_loss=0.0001344, whisper_loss=0.08693, over 15297.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001424, whisper_loss=0.09006, over 3786409.18 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:02:00,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4493490.0, ans=0.0 2024-08-19 19:02:09,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4493490.0, ans=0.125 2024-08-19 19:02:12,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4493490.0, ans=0.0 2024-08-19 19:02:19,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4493590.0, ans=0.2 2024-08-19 19:02:21,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4493590.0, ans=0.0 2024-08-19 19:02:39,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4493690.0, ans=0.125 2024-08-19 19:02:46,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4493690.0, ans=0.125 2024-08-19 19:03:01,873 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:03:05,007 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 19:03:06,741 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 19:03:09,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-19 19:03:09,639 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4800, loss[loss=0.09277, beats_loss=0.01031, ecapa_loss=0.000126, whisper_loss=0.0812, over 22325.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001436, whisper_loss=0.09028, over 3828225.18 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:03:16,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4493890.0, ans=0.0 2024-08-19 19:03:38,239 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2024-08-19 19:03:59,914 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 32 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 19:04:03,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4494190.0, ans=0.125 2024-08-19 19:04:15,589 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 19:04:21,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.336e+01 2.600e+01 2.820e+01 4.344e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-19 19:04:22,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4494290.0, ans=0.0 2024-08-19 19:04:33,251 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4850, loss[loss=0.1213, beats_loss=0.009677, ecapa_loss=0.0001641, whisper_loss=0.11, over 22271.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001433, whisper_loss=0.08941, over 3803841.59 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:04:42,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2024-08-19 19:04:52,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4494490.0, ans=0.0 2024-08-19 19:04:53,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4494490.0, ans=0.0 2024-08-19 19:04:54,899 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 19:05:00,200 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 19:05:02,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-19 19:05:12,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4494590.0, ans=0.1 2024-08-19 19:05:55,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4494890.0, ans=0.125 2024-08-19 19:05:56,416 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4900, loss[loss=0.0903, beats_loss=0.01073, ecapa_loss=0.0001527, whisper_loss=0.07805, over 21451.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001442, whisper_loss=0.08921, over 3807527.94 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:06:48,407 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 19:06:51,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4495190.0, ans=0.1 2024-08-19 19:06:53,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4495190.0, ans=0.0 2024-08-19 19:07:02,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4495190.0, ans=0.125 2024-08-19 19:07:10,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.336e+01 2.530e+01 2.860e+01 1.367e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-19 19:07:11,471 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2024-08-19 19:07:14,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4495290.0, ans=0.0 2024-08-19 19:07:16,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4495290.0, ans=0.125 2024-08-19 19:07:22,614 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 4950, loss[loss=0.1209, beats_loss=0.01122, ecapa_loss=0.0001467, whisper_loss=0.1083, over 23071.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001435, whisper_loss=0.08852, over 3798198.64 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:07:30,920 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 34 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 19:07:53,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4495490.0, ans=0.125 2024-08-19 19:08:17,406 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 19:08:19,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=4495690.0, ans=12.0 2024-08-19 19:08:30,482 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 19:08:39,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4495790.0, ans=0.125 2024-08-19 19:08:39,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4495790.0, ans=0.1 2024-08-19 19:08:49,523 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5000, loss[loss=0.1038, beats_loss=0.011, ecapa_loss=0.0001197, whisper_loss=0.0916, over 14923.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.000143, whisper_loss=0.08934, over 3822871.14 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:08:55,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4495890.0, ans=0.125 2024-08-19 19:08:58,093 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 19:09:17,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-19 19:09:29,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=12.0 2024-08-19 19:09:51,156 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 19:10:04,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.356e+01 2.547e+01 2.785e+01 7.027e+01, threshold=5.094e+01, percent-clipped=1.0 2024-08-19 19:10:16,956 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5050, loss[loss=0.09322, beats_loss=0.01022, ecapa_loss=0.0001275, whisper_loss=0.08172, over 21344.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001415, whisper_loss=0.08891, over 3838838.19 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:10:20,863 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2024-08-19 19:10:21,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4496390.0, ans=0.0 2024-08-19 19:10:23,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4496390.0, ans=0.125 2024-08-19 19:10:33,315 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 19:10:36,965 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 19:11:05,623 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 20 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 19:11:09,146 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 19:11:35,747 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 19:11:42,208 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5100, loss[loss=0.0882, beats_loss=0.01274, ecapa_loss=0.0001185, whisper_loss=0.07427, over 21758.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01057, ecapa_loss=0.0001407, whisper_loss=0.08891, over 3855619.49 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:11:55,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.13 vs. limit=5.0 2024-08-19 19:12:01,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2024-08-19 19:12:32,214 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 19:12:50,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4497290.0, ans=0.0 2024-08-19 19:12:53,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.293e+01 2.514e+01 2.831e+01 4.907e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 19:13:05,686 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5150, loss[loss=0.102, beats_loss=0.01086, ecapa_loss=0.0001304, whisper_loss=0.08981, over 22113.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01066, ecapa_loss=0.0001416, whisper_loss=0.08892, over 3845993.07 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:13:44,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2024-08-19 19:14:02,945 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 12 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 19:14:12,975 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 19:14:33,135 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5200, loss[loss=0.11, beats_loss=0.01101, ecapa_loss=0.0001735, whisper_loss=0.09728, over 22994.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001414, whisper_loss=0.08944, over 3832821.45 frames. ], batch size: 95, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:14:33,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4497890.0, ans=0.125 2024-08-19 19:14:44,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4497890.0, ans=0.5 2024-08-19 19:15:00,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4497990.0, ans=0.125 2024-08-19 19:15:22,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4498190.0, ans=0.125 2024-08-19 19:15:46,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.272e+01 2.588e+01 2.852e+01 4.438e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-19 19:15:49,720 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 19:15:50,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2024-08-19 19:15:57,701 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5250, loss[loss=0.103, beats_loss=0.009678, ecapa_loss=0.0001306, whisper_loss=0.09202, over 17196.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.08966, over 3807866.25 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:16:06,396 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 19:16:08,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4498390.0, ans=0.0 2024-08-19 19:16:22,575 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 19:16:31,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4498590.0, ans=0.0 2024-08-19 19:16:31,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4498590.0, ans=0.1 2024-08-19 19:16:37,857 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 19:16:40,861 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 26 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 19:16:41,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4498590.0, ans=0.125 2024-08-19 19:16:52,790 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 19:17:09,217 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 12 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 19:17:19,622 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5300, loss[loss=0.09126, beats_loss=0.00864, ecapa_loss=0.0001966, whisper_loss=0.08065, over 17789.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001419, whisper_loss=0.08959, over 3757455.45 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:17:24,986 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 19:17:32,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4498890.0, ans=0.125 2024-08-19 19:17:34,190 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07655883580446243, model_norm_threshold=51.76279067993164 2024-08-19 19:17:34,347 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.480e+05, grad_sumsq=1.406e+07, orig_rms_sq=1.053e-02 2024-08-19 19:18:17,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4499190.0, ans=0.0 2024-08-19 19:18:26,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4499290.0, ans=0.125 2024-08-19 19:18:30,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.288e+01 2.528e+01 2.946e+01 6.761e+02, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 19:18:30,536 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 19:18:42,118 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5350, loss[loss=0.07452, beats_loss=0.009458, ecapa_loss=0.0001713, whisper_loss=0.06335, over 16632.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001417, whisper_loss=0.08933, over 3782330.51 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:18:50,692 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 30 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-19 19:18:57,364 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05797187611460686, model_norm_threshold=50.55705261230469 2024-08-19 19:18:57,523 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.0.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.406e+05, grad_sumsq=1.406e+05, orig_rms_sq=1.000e+00 2024-08-19 19:19:07,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4499490.0, ans=10.0 2024-08-19 19:19:22,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4499590.0, ans=0.125 2024-08-19 19:19:27,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4499590.0, ans=0.125 2024-08-19 19:19:29,074 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 20 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-19 19:19:46,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4499690.0, ans=0.125 2024-08-19 19:19:53,043 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 19:19:57,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-08-19 19:20:08,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4499790.0, ans=0.125 2024-08-19 19:20:13,377 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5400, loss[loss=0.07567, beats_loss=0.01117, ecapa_loss=0.0001442, whisper_loss=0.06305, over 19986.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001419, whisper_loss=0.08961, over 3817938.02 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:20:30,761 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 8 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-19 19:20:36,114 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 19:20:47,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4500090.0, ans=0.1 2024-08-19 19:21:07,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4500190.0, ans=0.05 2024-08-19 19:21:19,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-19 19:21:27,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.273e+01 2.608e+01 3.002e+01 8.721e+02, threshold=5.217e+01, percent-clipped=3.0 2024-08-19 19:21:34,719 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 19:21:35,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4500290.0, ans=0.1 2024-08-19 19:21:39,178 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5450, loss[loss=0.1087, beats_loss=0.008885, ecapa_loss=0.0001278, whisper_loss=0.09855, over 16842.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.08922, over 3773909.37 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:22:17,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4500590.0, ans=0.125 2024-08-19 19:22:31,851 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.174e-03 2024-08-19 19:22:37,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4500690.0, ans=0.125 2024-08-19 19:22:51,361 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 19:22:51,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4500790.0, ans=0.035 2024-08-19 19:23:08,979 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5500, loss[loss=0.1022, beats_loss=0.01009, ecapa_loss=0.0001236, whisper_loss=0.09086, over 19499.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.08943, over 3798178.08 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:23:28,768 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 19:23:55,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4501090.0, ans=0.0 2024-08-19 19:23:56,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4501090.0, ans=0.2 2024-08-19 19:24:11,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4501190.0, ans=0.2 2024-08-19 19:24:25,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.236e+01 2.438e+01 2.713e+01 9.093e+01, threshold=4.875e+01, percent-clipped=1.0 2024-08-19 19:24:38,278 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 18 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 19:24:38,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4501390.0, ans=0.0 2024-08-19 19:24:39,708 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5550, loss[loss=0.07862, beats_loss=0.01072, ecapa_loss=0.0001636, whisper_loss=0.06626, over 19307.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.08947, over 3779747.83 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:24:47,856 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-19 19:24:52,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4501390.0, ans=0.2 2024-08-19 19:24:53,979 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 19:25:12,291 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 19:25:25,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4501590.0, ans=0.125 2024-08-19 19:25:42,073 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 19:25:49,542 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 19:25:55,971 INFO [train_multi_KD3.py:845] (1/4) A total of 97 cuts. 31 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 19:26:02,880 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 19:26:08,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4501790.0, ans=0.1 2024-08-19 19:26:15,402 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5600, loss[loss=0.1231, beats_loss=0.007401, ecapa_loss=0.0001674, whisper_loss=0.1141, over 12953.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.08973, over 3773998.33 frames. ], batch size: 50, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:26:15,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4501890.0, ans=0.125 2024-08-19 19:27:26,927 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 18 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 19:27:31,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4502290.0, ans=0.0 2024-08-19 19:27:37,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.290e+01 2.503e+01 2.698e+01 5.557e+01, threshold=5.007e+01, percent-clipped=1.0 2024-08-19 19:27:42,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4502290.0, ans=0.125 2024-08-19 19:27:51,977 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5650, loss[loss=0.1191, beats_loss=0.01098, ecapa_loss=0.0001193, whisper_loss=0.1069, over 19067.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.0001412, whisper_loss=0.09083, over 3820927.62 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:27:54,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4502390.0, ans=0.1 2024-08-19 19:29:01,094 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-19 19:29:07,829 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 19:29:27,594 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5700, loss[loss=0.1031, beats_loss=0.01152, ecapa_loss=0.0001556, whisper_loss=0.09003, over 21487.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.09062, over 3818177.79 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:30:00,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4502990.0, ans=0.0 2024-08-19 19:30:10,760 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-19 19:30:21,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4503090.0, ans=0.0 2024-08-19 19:30:31,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-08-19 19:30:45,580 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2024-08-19 19:30:47,113 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 19:30:51,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.276e+01 2.546e+01 2.979e+01 5.244e+01, threshold=5.092e+01, percent-clipped=1.0 2024-08-19 19:30:51,456 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 19:30:51,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4503290.0, ans=0.0 2024-08-19 19:31:04,801 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5750, loss[loss=0.1136, beats_loss=0.009554, ecapa_loss=0.0001493, whisper_loss=0.1025, over 22780.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01029, ecapa_loss=0.0001426, whisper_loss=0.09111, over 3841684.26 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:31:04,988 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 19:31:10,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4503390.0, ans=0.1 2024-08-19 19:31:39,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4503490.0, ans=0.0 2024-08-19 19:31:46,127 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 19:31:55,806 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.758e+01 2024-08-19 19:32:16,720 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 13 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 19:32:35,818 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5800, loss[loss=0.09663, beats_loss=0.01057, ecapa_loss=0.0001469, whisper_loss=0.08459, over 22812.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001423, whisper_loss=0.09041, over 3871720.14 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:32:45,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4503890.0, ans=0.025 2024-08-19 19:32:47,417 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.17 vs. limit=6.0 2024-08-19 19:32:47,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2024-08-19 19:32:53,066 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 19:32:55,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4503990.0, ans=0.125 2024-08-19 19:32:56,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4503990.0, ans=22.5 2024-08-19 19:33:01,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2024-08-19 19:33:26,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4504090.0, ans=0.0 2024-08-19 19:33:39,477 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 28 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-19 19:33:49,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4504190.0, ans=0.1 2024-08-19 19:33:51,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4504290.0, ans=0.1 2024-08-19 19:33:58,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.339e+01 2.561e+01 2.956e+01 4.463e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-19 19:34:09,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4504390.0, ans=0.0 2024-08-19 19:34:11,252 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5850, loss[loss=0.1167, beats_loss=0.008577, ecapa_loss=0.0001293, whisper_loss=0.1069, over 19619.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001421, whisper_loss=0.09075, over 3872543.58 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:34:11,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4504390.0, ans=0.05 2024-08-19 19:34:37,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4504490.0, ans=0.0 2024-08-19 19:35:08,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4504690.0, ans=0.125 2024-08-19 19:35:26,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4504790.0, ans=0.025 2024-08-19 19:35:26,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2024-08-19 19:35:44,443 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5900, loss[loss=0.1146, beats_loss=0.01061, ecapa_loss=0.0001397, whisper_loss=0.1026, over 20883.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001416, whisper_loss=0.08989, over 3829273.15 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:36:23,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4505090.0, ans=0.125 2024-08-19 19:36:33,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4505090.0, ans=0.125 2024-08-19 19:36:37,930 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2024-08-19 19:36:38,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-08-19 19:36:49,321 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 19:36:49,852 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-19 19:37:09,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.259e+01 2.428e+01 2.766e+01 1.765e+02, threshold=4.857e+01, percent-clipped=1.0 2024-08-19 19:37:19,167 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 23 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-19 19:37:22,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4505390.0, ans=0.1 2024-08-19 19:37:23,693 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 5950, loss[loss=0.1074, beats_loss=0.01154, ecapa_loss=0.0001457, whisper_loss=0.0944, over 21300.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.08992, over 3858487.53 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:37:27,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4505390.0, ans=0.125 2024-08-19 19:37:43,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.08 vs. limit=10.0 2024-08-19 19:38:13,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4505590.0, ans=0.125 2024-08-19 19:38:23,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4505690.0, ans=0.125 2024-08-19 19:38:32,414 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 23 from LS+wenet, 34 from Vox, 24 fro AS 2024-08-19 19:38:36,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4505690.0, ans=0.0 2024-08-19 19:38:38,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4505790.0, ans=0.0 2024-08-19 19:38:40,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=15.0 2024-08-19 19:38:50,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4505790.0, ans=0.125 2024-08-19 19:38:53,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2024-08-19 19:38:58,742 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6000, loss[loss=0.09917, beats_loss=0.01054, ecapa_loss=0.0001549, whisper_loss=0.08708, over 13426.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.08982, over 3840971.57 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:38:58,742 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-19 19:39:35,558 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005172, whisper_loss=0.2488, over 931116.00 frames. 2024-08-19 19:39:57,406 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-19 19:40:32,050 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2078, 1.7589, 1.8129, 1.7540], device='cuda:1') 2024-08-19 19:41:38,172 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 19:41:38,182 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-19 19:42:02,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4505990.0, ans=0.125 2024-08-19 19:42:03,765 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-19 19:42:12,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4506090.0, ans=0.0 2024-08-19 19:42:14,762 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-19 19:42:26,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4506090.0, ans=0.125 2024-08-19 19:42:34,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4506190.0, ans=0.0 2024-08-19 19:42:46,723 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 19:42:55,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.381e+01 2.694e+01 2.980e+01 4.120e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-19 19:42:58,010 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 22 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-19 19:43:02,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4506290.0, ans=0.0 2024-08-19 19:43:07,814 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6050, loss[loss=0.1163, beats_loss=0.008767, ecapa_loss=0.0001527, whisper_loss=0.106, over 17789.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.08963, over 3843035.50 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:43:14,925 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:43:31,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2024-08-19 19:43:34,140 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 19:43:35,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4506490.0, ans=0.125 2024-08-19 19:44:11,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4506690.0, ans=0.125 2024-08-19 19:44:19,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4506790.0, ans=0.2 2024-08-19 19:44:25,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-19 19:44:28,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4506790.0, ans=0.125 2024-08-19 19:44:37,711 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6100, loss[loss=0.1004, beats_loss=0.01073, ecapa_loss=0.0001292, whisper_loss=0.08836, over 21578.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001404, whisper_loss=0.0902, over 3849960.25 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:44:37,891 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 19:44:56,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4506990.0, ans=0.0 2024-08-19 19:45:12,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4507090.0, ans=0.0 2024-08-19 19:45:18,945 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 19:45:33,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2024-08-19 19:45:34,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4507190.0, ans=0.125 2024-08-19 19:45:46,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:45:49,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:45:51,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:45:52,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.249e+01 2.614e+01 2.889e+01 5.523e+01, threshold=5.228e+01, percent-clipped=1.0 2024-08-19 19:45:58,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4507290.0, ans=0.0 2024-08-19 19:46:06,398 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 32 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-19 19:46:07,432 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6150, loss[loss=0.1237, beats_loss=0.006642, ecapa_loss=0.0001671, whisper_loss=0.1154, over 19874.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001404, whisper_loss=0.09045, over 3841894.72 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:46:15,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.19 vs. limit=22.5 2024-08-19 19:46:25,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-19 19:46:28,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4507490.0, ans=0.125 2024-08-19 19:46:48,910 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 19:46:54,466 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 19:47:38,074 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6200, loss[loss=0.1154, beats_loss=0.008875, ecapa_loss=0.0001341, whisper_loss=0.1052, over 24526.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.09097, over 3878800.47 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:48:05,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4507990.0, ans=0.0 2024-08-19 19:48:11,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4507990.0, ans=0.125 2024-08-19 19:48:12,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4508090.0, ans=0.0 2024-08-19 19:48:16,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4508090.0, ans=0.125 2024-08-19 19:48:31,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4508190.0, ans=0.125 2024-08-19 19:48:33,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4508190.0, ans=0.125 2024-08-19 19:48:59,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.361e+01 2.656e+01 2.980e+01 4.502e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-19 19:49:01,986 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 19:49:08,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4508290.0, ans=0.125 2024-08-19 19:49:14,603 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6250, loss[loss=0.08961, beats_loss=0.01293, ecapa_loss=0.0001009, whisper_loss=0.07568, over 14199.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001402, whisper_loss=0.09107, over 3876625.18 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:49:33,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4508490.0, ans=0.2 2024-08-19 19:49:56,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4508590.0, ans=0.125 2024-08-19 19:49:57,779 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 12 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 19:50:03,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4508590.0, ans=0.125 2024-08-19 19:50:05,405 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 19:50:19,228 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 19:50:41,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4508790.0, ans=0.0 2024-08-19 19:50:53,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4508890.0, ans=0.125 2024-08-19 19:50:55,203 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6300, loss[loss=0.08783, beats_loss=0.0085, ecapa_loss=0.0001909, whisper_loss=0.07742, over 20006.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001407, whisper_loss=0.09037, over 3855392.13 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:51:02,552 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:51:02,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4508890.0, ans=0.2 2024-08-19 19:51:18,282 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 17 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 19:51:35,343 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-19 19:51:35,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4509090.0, ans=0.125 2024-08-19 19:51:41,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4509090.0, ans=0.09899494936611666 2024-08-19 19:51:47,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4509090.0, ans=0.125 2024-08-19 19:51:50,588 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 19:52:09,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4509190.0, ans=0.0 2024-08-19 19:52:14,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4509290.0, ans=0.04949747468305833 2024-08-19 19:52:20,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.295e+01 2.454e+01 2.761e+01 3.903e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-19 19:52:26,772 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-19 19:52:34,339 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6350, loss[loss=0.104, beats_loss=0.007794, ecapa_loss=0.0001781, whisper_loss=0.09442, over 13599.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.000141, whisper_loss=0.08959, over 3863758.87 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:52:34,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4509390.0, ans=0.125 2024-08-19 19:52:36,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4509390.0, ans=0.125 2024-08-19 19:52:44,166 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:53:10,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4509490.0, ans=0.125 2024-08-19 19:53:13,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4509590.0, ans=0.125 2024-08-19 19:53:24,303 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 28 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 19:53:24,735 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-19 19:53:28,284 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 18 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 19:53:42,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-08-19 19:53:47,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4509690.0, ans=0.125 2024-08-19 19:53:53,641 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 19:54:08,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-19 19:54:13,964 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6400, loss[loss=0.08846, beats_loss=0.01365, ecapa_loss=0.0001049, whisper_loss=0.07376, over 18620.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001404, whisper_loss=0.08945, over 3828534.58 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:54:14,204 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 25 from LS+wenet, 12 from Vox, 14 fro AS 2024-08-19 19:54:16,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4509890.0, ans=0.2 2024-08-19 19:54:29,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4509890.0, ans=0.125 2024-08-19 19:54:41,307 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 20 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 19:54:42,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-19 19:54:42,906 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 19:54:59,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4510090.0, ans=0.0 2024-08-19 19:55:05,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4510090.0, ans=0.125 2024-08-19 19:55:20,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4510190.0, ans=0.09899494936611666 2024-08-19 19:55:38,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.373e+01 2.627e+01 3.161e+01 1.061e+02, threshold=5.254e+01, percent-clipped=1.0 2024-08-19 19:55:50,459 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-19 19:55:51,588 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6450, loss[loss=0.09816, beats_loss=0.01077, ecapa_loss=0.0001901, whisper_loss=0.08549, over 21262.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001414, whisper_loss=0.08961, over 3815947.47 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:56:01,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4510390.0, ans=0.0 2024-08-19 19:56:02,704 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 19:56:12,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4510490.0, ans=0.0 2024-08-19 19:56:15,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4510490.0, ans=0.015 2024-08-19 19:56:20,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4510490.0, ans=10.0 2024-08-19 19:56:20,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4510490.0, ans=0.125 2024-08-19 19:56:46,869 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 19:56:52,450 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 16 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 19:57:09,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-08-19 19:57:14,856 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 12 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-19 19:57:18,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4510790.0, ans=0.0 2024-08-19 19:57:21,328 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 19:57:25,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4510790.0, ans=0.125 2024-08-19 19:57:28,138 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6500, loss[loss=0.07882, beats_loss=0.01147, ecapa_loss=0.0001308, whisper_loss=0.06604, over 14506.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001407, whisper_loss=0.08991, over 3795784.61 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:58:13,966 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 19:58:23,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4511190.0, ans=0.1 2024-08-19 19:58:40,143 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 19:58:43,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.329e+01 2.544e+01 2.954e+01 4.370e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-19 19:58:55,528 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6550, loss[loss=0.09117, beats_loss=0.01339, ecapa_loss=0.0001408, whisper_loss=0.07637, over 20825.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001402, whisper_loss=0.09025, over 3831612.96 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:59:17,059 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 35 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 19:59:46,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4511690.0, ans=0.125 2024-08-19 20:00:14,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4511790.0, ans=0.125 2024-08-19 20:00:21,665 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6600, loss[loss=0.1089, beats_loss=0.01079, ecapa_loss=0.0001508, whisper_loss=0.09661, over 22987.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001408, whisper_loss=0.09045, over 3837550.66 frames. ], batch size: 95, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:00:29,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-08-19 20:00:34,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4511890.0, ans=0.2 2024-08-19 20:00:59,844 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 20:01:00,145 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:01:01,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4512090.0, ans=0.1 2024-08-19 20:01:21,023 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 20:01:27,655 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 29 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 20:01:31,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4512290.0, ans=0.05 2024-08-19 20:01:34,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.409e+01 2.632e+01 2.888e+01 4.355e+02, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 20:01:45,091 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6650, loss[loss=0.1234, beats_loss=0.01064, ecapa_loss=9.635e-05, whisper_loss=0.1118, over 17683.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001412, whisper_loss=0.09078, over 3842998.07 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:02:06,612 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 20:02:17,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4512590.0, ans=0.125 2024-08-19 20:02:18,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4512590.0, ans=0.2 2024-08-19 20:02:18,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4512590.0, ans=0.0 2024-08-19 20:02:49,819 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 20:02:56,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4512790.0, ans=0.0 2024-08-19 20:02:59,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4512790.0, ans=0.2 2024-08-19 20:03:02,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4512790.0, ans=0.125 2024-08-19 20:03:06,964 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6700, loss[loss=0.08265, beats_loss=0.01444, ecapa_loss=0.0001011, whisper_loss=0.0672, over 16729.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.0902, over 3857598.05 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:03:12,377 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 20:03:13,854 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 20:03:20,664 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 31 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-19 20:03:30,166 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 20:03:35,336 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 20:03:51,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4513090.0, ans=0.1 2024-08-19 20:03:56,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4513190.0, ans=0.5 2024-08-19 20:03:59,886 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 20:04:02,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4513190.0, ans=0.0 2024-08-19 20:04:03,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4513190.0, ans=0.1 2024-08-19 20:04:03,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4513190.0, ans=0.0 2024-08-19 20:04:17,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4513290.0, ans=0.125 2024-08-19 20:04:19,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4513290.0, ans=0.2 2024-08-19 20:04:20,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.359e+01 2.752e+01 3.006e+01 5.924e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-19 20:04:30,041 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 20:04:31,941 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6750, loss[loss=0.09596, beats_loss=0.01114, ecapa_loss=0.0001302, whisper_loss=0.08351, over 17972.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01029, ecapa_loss=0.000143, whisper_loss=0.09136, over 3864700.18 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:04:34,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4513390.0, ans=0.125 2024-08-19 20:04:34,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4513390.0, ans=0.1 2024-08-19 20:04:56,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4513490.0, ans=0.1 2024-08-19 20:05:02,764 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 22 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-19 20:05:04,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4513590.0, ans=0.125 2024-08-19 20:05:11,115 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 20:05:24,533 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 20:05:35,826 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 20:05:46,317 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 20:05:56,268 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6800, loss[loss=0.1063, beats_loss=0.01048, ecapa_loss=0.0001345, whisper_loss=0.09453, over 21879.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01028, ecapa_loss=0.0001439, whisper_loss=0.09142, over 3828294.61 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:06:04,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4513890.0, ans=0.1 2024-08-19 20:06:40,218 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 20:06:46,382 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 20:07:05,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4514290.0, ans=0.125 2024-08-19 20:07:08,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.317e+01 2.530e+01 2.822e+01 4.267e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-19 20:07:17,725 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6850, loss[loss=0.09994, beats_loss=0.01095, ecapa_loss=0.0001285, whisper_loss=0.0877, over 23377.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001425, whisper_loss=0.09132, over 3835840.58 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:07:33,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-19 20:07:34,302 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 29 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 20:07:46,430 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 20:07:48,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-19 20:07:50,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4514590.0, ans=0.0 2024-08-19 20:07:52,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4514590.0, ans=0.2 2024-08-19 20:07:55,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4514590.0, ans=0.07 2024-08-19 20:08:40,595 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6900, loss[loss=0.121, beats_loss=0.009604, ecapa_loss=0.0001326, whisper_loss=0.1101, over 24202.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.0001418, whisper_loss=0.09171, over 3847515.88 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:09:02,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4514990.0, ans=0.0 2024-08-19 20:09:13,871 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 20:09:15,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4515090.0, ans=0.2 2024-08-19 20:09:27,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4515190.0, ans=0.1 2024-08-19 20:09:34,838 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 20:09:39,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4515190.0, ans=0.1 2024-08-19 20:09:41,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2024-08-19 20:09:48,593 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 20:09:49,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.290e+01 2.485e+01 2.784e+01 7.248e+01, threshold=4.970e+01, percent-clipped=1.0 2024-08-19 20:09:53,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4515290.0, ans=0.125 2024-08-19 20:09:54,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-08-19 20:09:58,178 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 20:09:59,402 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 6950, loss[loss=0.1107, beats_loss=0.00982, ecapa_loss=0.0001295, whisper_loss=0.09958, over 17728.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001417, whisper_loss=0.09147, over 3850345.28 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:10:24,101 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 15 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 20:10:31,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4515590.0, ans=0.125 2024-08-19 20:10:39,944 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:10:45,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4515690.0, ans=0.125 2024-08-19 20:10:49,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4515690.0, ans=0.125 2024-08-19 20:10:52,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4515690.0, ans=0.0 2024-08-19 20:10:58,657 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 20:11:03,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4515790.0, ans=0.125 2024-08-19 20:11:14,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4515790.0, ans=0.125 2024-08-19 20:11:19,791 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7000, loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001306, whisper_loss=0.09059, over 23663.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001414, whisper_loss=0.09061, over 3813403.81 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:11:43,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4515990.0, ans=0.125 2024-08-19 20:11:48,347 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 20:12:09,048 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 20:12:32,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.294e+01 2.487e+01 2.816e+01 5.941e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-19 20:12:34,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4516290.0, ans=0.1 2024-08-19 20:12:41,755 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7050, loss[loss=0.103, beats_loss=0.006846, ecapa_loss=0.000176, whisper_loss=0.09442, over 13675.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.09061, over 3817178.74 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:12:55,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4516390.0, ans=0.0 2024-08-19 20:13:03,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4516490.0, ans=0.2 2024-08-19 20:13:08,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-19 20:13:33,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4516690.0, ans=0.04949747468305833 2024-08-19 20:13:35,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2024-08-19 20:13:45,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4516690.0, ans=0.0 2024-08-19 20:13:54,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4516790.0, ans=0.0 2024-08-19 20:13:59,875 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 20:14:08,904 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7100, loss[loss=0.08744, beats_loss=0.01025, ecapa_loss=0.0001499, whisper_loss=0.07569, over 19408.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.0903, over 3810918.36 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:14:11,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4516890.0, ans=0.2 2024-08-19 20:14:11,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=12.0 2024-08-19 20:14:29,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4516990.0, ans=0.025 2024-08-19 20:14:46,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4517090.0, ans=10.0 2024-08-19 20:14:56,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4517090.0, ans=0.0 2024-08-19 20:14:57,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4517190.0, ans=0.0 2024-08-19 20:15:15,221 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 20:15:16,501 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 20:15:21,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.242e+01 2.446e+01 2.720e+01 3.661e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-19 20:15:23,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4517290.0, ans=0.125 2024-08-19 20:15:25,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4517290.0, ans=0.0 2024-08-19 20:15:31,633 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7150, loss[loss=0.09147, beats_loss=0.0115, ecapa_loss=0.0001304, whisper_loss=0.07867, over 19822.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001395, whisper_loss=0.08952, over 3784065.76 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:16:03,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-19 20:16:27,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4517690.0, ans=0.1 2024-08-19 20:16:40,350 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 20:16:55,015 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7200, loss[loss=0.09445, beats_loss=0.01052, ecapa_loss=0.0001349, whisper_loss=0.08257, over 20587.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.00014, whisper_loss=0.08983, over 3809282.36 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:16:59,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4517890.0, ans=0.125 2024-08-19 20:17:23,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4517990.0, ans=0.2 2024-08-19 20:17:26,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4518090.0, ans=0.025 2024-08-19 20:17:39,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=12.0 2024-08-19 20:17:41,301 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 20:17:49,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4518190.0, ans=0.0 2024-08-19 20:17:57,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4518190.0, ans=0.0 2024-08-19 20:18:08,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.204e+01 2.383e+01 2.664e+01 1.113e+02, threshold=4.766e+01, percent-clipped=1.0 2024-08-19 20:18:18,050 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7250, loss[loss=0.06974, beats_loss=0.01269, ecapa_loss=0.0001414, whisper_loss=0.05563, over 16202.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001389, whisper_loss=0.08949, over 3765106.86 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:18:28,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4518390.0, ans=0.2 2024-08-19 20:18:29,478 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 20:18:30,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-08-19 20:18:37,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4518490.0, ans=0.125 2024-08-19 20:18:59,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4518590.0, ans=0.125 2024-08-19 20:19:23,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4518790.0, ans=0.0 2024-08-19 20:19:39,698 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7300, loss[loss=0.1083, beats_loss=0.01003, ecapa_loss=0.0001553, whisper_loss=0.09673, over 22027.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.08957, over 3781419.34 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:19:48,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4518890.0, ans=0.1 2024-08-19 20:20:09,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4518990.0, ans=0.1 2024-08-19 20:20:23,264 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09269597381353378, model_norm_threshold=47.66118240356445 2024-08-19 20:20:23,421 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.810e+04, grad_sumsq=3.810e+04, orig_rms_sq=1.000e+00 2024-08-19 20:20:27,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4519190.0, ans=0.0 2024-08-19 20:20:50,342 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 20:20:53,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.360e+01 2.664e+01 3.085e+01 5.142e+02, threshold=5.329e+01, percent-clipped=3.0 2024-08-19 20:20:53,775 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 20:20:53,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4519290.0, ans=0.04949747468305833 2024-08-19 20:21:04,143 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7350, loss[loss=0.1014, beats_loss=0.01078, ecapa_loss=0.0001085, whisper_loss=0.08955, over 21500.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08926, over 3800089.87 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:21:10,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4519390.0, ans=0.125 2024-08-19 20:21:21,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4519490.0, ans=0.125 2024-08-19 20:21:28,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4519490.0, ans=0.125 2024-08-19 20:21:32,184 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 20:21:34,574 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.119e+05 2024-08-19 20:21:36,126 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.58 vs. limit=10.0 2024-08-19 20:21:38,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4519490.0, ans=0.125 2024-08-19 20:22:13,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.43 vs. limit=22.5 2024-08-19 20:22:21,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4519790.0, ans=0.2 2024-08-19 20:22:21,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2024-08-19 20:22:22,815 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 20:22:28,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4519790.0, ans=0.95 2024-08-19 20:22:35,260 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7400, loss[loss=0.08891, beats_loss=0.009288, ecapa_loss=0.0001358, whisper_loss=0.07826, over 14161.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.08949, over 3785112.62 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:22:38,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4519890.0, ans=0.125 2024-08-19 20:22:55,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4519990.0, ans=0.125 2024-08-19 20:23:10,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:10,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:17,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:36,601 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 20:23:40,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4520190.0, ans=0.125 2024-08-19 20:23:40,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4520190.0, ans=0.125 2024-08-19 20:23:44,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4520190.0, ans=0.1 2024-08-19 20:23:52,132 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 17 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 20:23:53,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.289e+01 2.494e+01 2.789e+01 3.959e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-19 20:23:55,468 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 20:23:59,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4520290.0, ans=0.125 2024-08-19 20:24:04,773 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7450, loss[loss=0.1009, beats_loss=0.01007, ecapa_loss=0.0001106, whisper_loss=0.08968, over 19693.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001423, whisper_loss=0.08986, over 3772928.85 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:24:18,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4520390.0, ans=0.125 2024-08-19 20:24:19,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4520390.0, ans=0.125 2024-08-19 20:24:21,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4520490.0, ans=0.125 2024-08-19 20:24:21,438 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-19 20:24:23,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4520490.0, ans=0.125 2024-08-19 20:24:30,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-19 20:24:59,649 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 20:25:07,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4520690.0, ans=15.0 2024-08-19 20:25:14,652 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-19 20:25:14,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2024-08-19 20:25:23,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4520790.0, ans=0.0 2024-08-19 20:25:35,141 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7500, loss[loss=0.1033, beats_loss=0.01232, ecapa_loss=9.896e-05, whisper_loss=0.08999, over 21639.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001417, whisper_loss=0.08936, over 3767041.38 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:25:35,391 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-19 20:25:52,001 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 20:25:52,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4520990.0, ans=0.0 2024-08-19 20:25:52,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4520990.0, ans=0.125 2024-08-19 20:25:57,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4520990.0, ans=0.1 2024-08-19 20:26:05,800 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.22 vs. limit=10.0 2024-08-19 20:26:09,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4520990.0, ans=0.2 2024-08-19 20:26:19,480 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 20:26:55,426 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-08-19 20:26:56,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.301e+01 2.566e+01 2.939e+01 6.434e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 20:27:06,800 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7550, loss[loss=0.07449, beats_loss=0.01346, ecapa_loss=0.0001257, whisper_loss=0.05978, over 18630.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001406, whisper_loss=0.08906, over 3798086.27 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:27:09,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4521390.0, ans=0.125 2024-08-19 20:27:11,434 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 20:27:13,152 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 20:27:17,221 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 38 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 20:27:18,815 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 20:27:22,733 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 19 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 20:27:33,631 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 20:27:35,869 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 20:27:46,396 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 20:28:02,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4521690.0, ans=0.125 2024-08-19 20:28:22,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4521790.0, ans=0.0 2024-08-19 20:28:29,933 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 20:28:37,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4521790.0, ans=0.2 2024-08-19 20:28:40,256 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7600, loss[loss=0.09751, beats_loss=0.01177, ecapa_loss=0.000107, whisper_loss=0.08467, over 24010.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.08927, over 3806775.49 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:28:47,952 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 20:28:54,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4521890.0, ans=0.1 2024-08-19 20:28:54,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=4521890.0, ans=0.2 2024-08-19 20:29:00,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4521990.0, ans=0.125 2024-08-19 20:29:08,185 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-19 20:29:26,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4522090.0, ans=0.0 2024-08-19 20:29:33,886 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 20:29:41,438 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 20:29:48,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-19 20:30:04,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.242e+01 2.534e+01 2.867e+01 1.676e+03, threshold=5.067e+01, percent-clipped=0.0 2024-08-19 20:30:04,591 WARNING [optim.py:496] (1/4) Scaling gradients by 0.030242323875427246, model_norm_threshold=50.67152404785156 2024-08-19 20:30:04,747 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.977e+05, grad_sumsq=7.564e+07, orig_rms_sq=1.055e-02 2024-08-19 20:30:15,788 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7650, loss[loss=0.09472, beats_loss=0.01175, ecapa_loss=0.0001542, whisper_loss=0.08142, over 21094.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001414, whisper_loss=0.08949, over 3810864.65 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:30:23,498 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 20:30:38,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4522490.0, ans=0.09899494936611666 2024-08-19 20:31:08,964 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-19 20:31:12,917 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 20:31:32,206 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 20:31:40,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4522790.0, ans=0.0 2024-08-19 20:31:45,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-08-19 20:31:50,081 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7700, loss[loss=0.09365, beats_loss=0.008597, ecapa_loss=0.0001145, whisper_loss=0.08391, over 15586.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.0898, over 3799730.62 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:31:59,444 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 39 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-19 20:31:59,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4522890.0, ans=0.0 2024-08-19 20:32:03,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4522890.0, ans=0.0 2024-08-19 20:32:09,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4522990.0, ans=0.95 2024-08-19 20:32:16,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2024-08-19 20:32:21,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4522990.0, ans=0.0 2024-08-19 20:32:29,894 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 20:32:35,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4523090.0, ans=10.0 2024-08-19 20:32:42,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4523190.0, ans=0.125 2024-08-19 20:33:10,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.323e+01 2.517e+01 2.796e+01 4.474e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-19 20:33:19,413 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7750, loss[loss=0.09904, beats_loss=0.008617, ecapa_loss=0.0001683, whisper_loss=0.08874, over 17947.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.08955, over 3801999.92 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:33:19,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4523390.0, ans=0.2 2024-08-19 20:33:23,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4523390.0, ans=0.125 2024-08-19 20:33:46,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4523490.0, ans=0.125 2024-08-19 20:33:47,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4523490.0, ans=0.0 2024-08-19 20:33:51,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4523490.0, ans=0.0 2024-08-19 20:33:55,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4523590.0, ans=0.125 2024-08-19 20:34:07,845 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2024-08-19 20:34:19,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4523690.0, ans=0.125 2024-08-19 20:34:39,790 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 20:34:48,631 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 20:34:50,101 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7800, loss[loss=0.1119, beats_loss=0.007953, ecapa_loss=0.0001596, whisper_loss=0.1023, over 16520.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01028, ecapa_loss=0.0001407, whisper_loss=0.09058, over 3814634.07 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:34:55,566 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-19 20:35:08,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4523990.0, ans=0.1 2024-08-19 20:35:11,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4523990.0, ans=0.1 2024-08-19 20:35:23,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:37,704 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 20:35:42,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4524190.0, ans=0.125 2024-08-19 20:35:45,072 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=15.0 2024-08-19 20:35:46,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4524190.0, ans=0.09899494936611666 2024-08-19 20:35:55,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4524190.0, ans=0.2 2024-08-19 20:35:56,419 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 20:36:09,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.228e+01 2.464e+01 2.830e+01 4.593e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-19 20:36:13,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2024-08-19 20:36:15,078 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 20:36:18,138 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7850, loss[loss=0.09912, beats_loss=0.0103, ecapa_loss=0.0001577, whisper_loss=0.08724, over 21645.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08928, over 3819540.95 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:36:18,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4524390.0, ans=0.125 2024-08-19 20:36:36,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4524490.0, ans=0.5 2024-08-19 20:36:38,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4524490.0, ans=0.0 2024-08-19 20:37:06,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2024-08-19 20:37:09,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4524690.0, ans=10.0 2024-08-19 20:37:18,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-08-19 20:37:20,075 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 20:37:46,787 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7900, loss[loss=0.1071, beats_loss=0.009091, ecapa_loss=0.000204, whisper_loss=0.09594, over 20623.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.08901, over 3821281.91 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:38:17,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4524990.0, ans=0.0 2024-08-19 20:38:21,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4525090.0, ans=0.025 2024-08-19 20:38:24,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4525090.0, ans=0.07 2024-08-19 20:38:52,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2024-08-19 20:39:04,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.85 vs. limit=15.0 2024-08-19 20:39:06,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.308e+01 2.634e+01 2.974e+01 4.173e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-19 20:39:08,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4525290.0, ans=0.2 2024-08-19 20:39:15,482 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 7950, loss[loss=0.08514, beats_loss=0.01083, ecapa_loss=0.0001801, whisper_loss=0.07251, over 15732.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.08934, over 3807398.81 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:39:31,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4525490.0, ans=0.125 2024-08-19 20:39:37,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4525490.0, ans=0.125 2024-08-19 20:39:44,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4525490.0, ans=0.125 2024-08-19 20:39:50,770 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 20:40:06,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4525690.0, ans=0.125 2024-08-19 20:40:11,275 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 17 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 20:40:20,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2024-08-19 20:40:32,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4525790.0, ans=0.0 2024-08-19 20:40:42,444 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8000, loss[loss=0.1214, beats_loss=0.008186, ecapa_loss=0.0001492, whisper_loss=0.1117, over 16025.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.08998, over 3801763.20 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:40:43,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.55 vs. limit=10.0 2024-08-19 20:40:51,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4525890.0, ans=0.04949747468305833 2024-08-19 20:40:59,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4525990.0, ans=0.2 2024-08-19 20:40:59,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4525990.0, ans=0.0 2024-08-19 20:41:22,563 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 20:41:48,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4526190.0, ans=0.125 2024-08-19 20:41:51,810 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 20:41:55,904 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-19 20:42:05,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.388e+01 2.587e+01 2.894e+01 4.259e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 20:42:08,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4526290.0, ans=0.0 2024-08-19 20:42:09,810 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 36 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 20:42:10,408 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-19 20:42:15,376 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8050, loss[loss=0.11, beats_loss=0.007546, ecapa_loss=0.0001685, whisper_loss=0.1008, over 16226.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.09031, over 3795751.30 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:42:22,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4526390.0, ans=0.09899494936611666 2024-08-19 20:42:39,491 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=12.0 2024-08-19 20:43:09,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-19 20:43:30,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4526790.0, ans=0.125 2024-08-19 20:43:39,947 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 20:43:49,411 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8100, loss[loss=0.09586, beats_loss=0.009266, ecapa_loss=0.0001428, whisper_loss=0.08517, over 20371.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.09054, over 3800853.71 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:43:49,581 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 10 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 20:44:01,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4526890.0, ans=0.125 2024-08-19 20:44:31,169 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-19 20:44:46,687 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 20:44:49,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4527190.0, ans=0.0 2024-08-19 20:44:50,437 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 20:44:53,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4527190.0, ans=0.125 2024-08-19 20:45:11,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-08-19 20:45:20,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.340e+01 2.531e+01 2.954e+01 4.685e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 20:45:30,750 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8150, loss[loss=0.1188, beats_loss=0.009527, ecapa_loss=0.0001296, whisper_loss=0.108, over 24209.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.09057, over 3805690.90 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:45:41,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4527390.0, ans=0.125 2024-08-19 20:45:56,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=4527490.0, ans=0.1 2024-08-19 20:46:03,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4527490.0, ans=0.0 2024-08-19 20:46:35,978 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 20:46:49,862 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 20:46:59,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-19 20:47:07,990 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8200, loss[loss=0.08637, beats_loss=0.01038, ecapa_loss=9.807e-05, whisper_loss=0.07501, over 14939.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.09099, over 3820767.18 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:47:28,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4527990.0, ans=0.1 2024-08-19 20:47:36,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.02 vs. limit=22.5 2024-08-19 20:47:57,410 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.344e+00 2024-08-19 20:48:35,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.270e+01 2.400e+01 2.604e+01 4.192e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-19 20:48:35,622 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 20:48:39,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4528290.0, ans=0.125 2024-08-19 20:48:44,749 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8250, loss[loss=0.113, beats_loss=0.009678, ecapa_loss=0.0001285, whisper_loss=0.102, over 23651.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.09055, over 3833528.52 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:49:24,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-19 20:50:02,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4528790.0, ans=0.025 2024-08-19 20:50:20,491 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8300, loss[loss=0.1202, beats_loss=0.009088, ecapa_loss=0.000137, whisper_loss=0.1097, over 15991.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.000141, whisper_loss=0.09037, over 3824715.53 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:50:25,834 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 20:50:33,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4528890.0, ans=0.0 2024-08-19 20:50:36,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4528990.0, ans=10.0 2024-08-19 20:51:14,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4529190.0, ans=0.1 2024-08-19 20:51:17,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-19 20:51:25,073 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-19 20:51:25,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2024-08-19 20:51:42,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.312e+01 2.525e+01 2.734e+01 6.042e+01, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 20:51:42,997 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 18 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-19 20:51:50,625 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 20:51:51,661 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8350, loss[loss=0.1019, beats_loss=0.009584, ecapa_loss=0.0001322, whisper_loss=0.09096, over 16349.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.000141, whisper_loss=0.09051, over 3845641.20 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:52:25,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4529490.0, ans=0.0 2024-08-19 20:52:32,642 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 20:52:39,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4529590.0, ans=0.2 2024-08-19 20:52:53,980 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:53:05,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4529690.0, ans=0.125 2024-08-19 20:53:14,299 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 20:53:21,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4529790.0, ans=0.5 2024-08-19 20:53:25,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4529790.0, ans=0.0 2024-08-19 20:53:29,283 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8400, loss[loss=0.1157, beats_loss=0.009661, ecapa_loss=0.0001711, whisper_loss=0.1043, over 21541.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001414, whisper_loss=0.09063, over 3794193.97 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:53:31,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4529890.0, ans=0.95 2024-08-19 20:53:58,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4529990.0, ans=0.04949747468305833 2024-08-19 20:54:43,229 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 28 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 20:54:48,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.271e+01 2.577e+01 2.838e+01 4.178e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-19 20:54:55,600 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 33 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 20:54:59,405 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8450, loss[loss=0.08543, beats_loss=0.0129, ecapa_loss=0.0001434, whisper_loss=0.07109, over 21292.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.000142, whisper_loss=0.09082, over 3793334.35 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:55:13,663 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 20:55:29,602 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 20:55:32,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2024-08-19 20:55:35,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4530490.0, ans=0.0 2024-08-19 20:55:38,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4530590.0, ans=0.125 2024-08-19 20:55:43,464 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 10 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 20:56:19,009 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 20:56:33,781 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 20:56:40,127 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8500, loss[loss=0.07222, beats_loss=0.009645, ecapa_loss=0.000152, whisper_loss=0.06106, over 14007.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.08956, over 3766600.63 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:56:40,710 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-19 20:57:03,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4530990.0, ans=0.125 2024-08-19 20:57:39,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4531190.0, ans=0.125 2024-08-19 20:57:41,229 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 20:57:43,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4531190.0, ans=0.0 2024-08-19 20:58:13,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.342e+01 2.609e+01 2.850e+01 2.704e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-19 20:58:16,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4531290.0, ans=0.95 2024-08-19 20:58:23,654 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8550, loss[loss=0.1153, beats_loss=0.01114, ecapa_loss=0.0001269, whisper_loss=0.1028, over 21970.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001422, whisper_loss=0.08972, over 3785394.92 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:58:42,308 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 20:59:11,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4531590.0, ans=0.125 2024-08-19 20:59:16,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-08-19 20:59:21,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4531690.0, ans=0.1 2024-08-19 20:59:42,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-08-19 20:59:44,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2024-08-19 20:59:45,149 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 20:59:52,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4531790.0, ans=0.0 2024-08-19 20:59:59,581 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8600, loss[loss=0.1033, beats_loss=0.008371, ecapa_loss=0.0001477, whisper_loss=0.09341, over 19288.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001408, whisper_loss=0.09032, over 3786517.24 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:00:14,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4531890.0, ans=0.2 2024-08-19 21:00:27,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4531990.0, ans=0.05 2024-08-19 21:00:30,603 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 21:00:36,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4531990.0, ans=0.0 2024-08-19 21:00:47,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-19 21:00:47,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-19 21:01:03,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4532190.0, ans=0.125 2024-08-19 21:01:05,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4532190.0, ans=0.0 2024-08-19 21:01:15,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4532190.0, ans=0.2 2024-08-19 21:01:30,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.352e+01 2.546e+01 2.927e+01 4.092e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-19 21:01:39,166 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8650, loss[loss=0.1017, beats_loss=0.01027, ecapa_loss=0.0001356, whisper_loss=0.09009, over 23699.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.09077, over 3797327.40 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:01:39,656 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.141e+01 2024-08-19 21:01:48,126 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 21:01:59,297 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 21:02:00,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4532490.0, ans=0.125 2024-08-19 21:02:02,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4532490.0, ans=0.125 2024-08-19 21:02:03,957 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 14 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 21:02:48,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4532690.0, ans=0.0 2024-08-19 21:03:09,285 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 21:03:12,606 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 21:03:14,435 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8700, loss[loss=0.108, beats_loss=0.0101, ecapa_loss=0.0001291, whisper_loss=0.09659, over 24136.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001426, whisper_loss=0.09067, over 3805874.73 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:03:27,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4532890.0, ans=0.0 2024-08-19 21:03:33,519 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 22 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-19 21:03:42,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4532990.0, ans=0.125 2024-08-19 21:03:59,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4533090.0, ans=0.1 2024-08-19 21:04:16,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2024-08-19 21:04:19,530 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 21:04:25,185 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 18 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 21:04:34,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.263e+01 2.457e+01 2.766e+01 3.922e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-19 21:04:43,639 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8750, loss[loss=0.1096, beats_loss=0.008874, ecapa_loss=0.0001383, whisper_loss=0.09929, over 18549.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001416, whisper_loss=0.09127, over 3802794.07 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:04:51,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4533390.0, ans=0.125 2024-08-19 21:05:09,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4533490.0, ans=0.0 2024-08-19 21:05:12,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4533490.0, ans=0.125 2024-08-19 21:05:22,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-08-19 21:05:42,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4533690.0, ans=0.125 2024-08-19 21:05:52,479 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 21:06:02,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4533790.0, ans=0.0 2024-08-19 21:06:02,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4533790.0, ans=0.0 2024-08-19 21:06:02,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4533790.0, ans=0.0 2024-08-19 21:06:16,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4533890.0, ans=0.0 2024-08-19 21:06:16,981 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8800, loss[loss=0.1173, beats_loss=0.009017, ecapa_loss=0.0001462, whisper_loss=0.1068, over 23118.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.0906, over 3786317.53 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:06:24,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4533890.0, ans=0.05 2024-08-19 21:06:25,692 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 29 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 21:06:37,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4533990.0, ans=0.125 2024-08-19 21:06:37,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=12.0 2024-08-19 21:06:38,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4533990.0, ans=0.125 2024-08-19 21:07:08,951 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 21:07:11,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4534190.0, ans=0.125 2024-08-19 21:07:29,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-19 21:07:32,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4534290.0, ans=0.1 2024-08-19 21:07:33,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.239e+01 2.468e+01 2.713e+01 3.674e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-19 21:07:40,831 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-19 21:07:42,247 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8850, loss[loss=0.1006, beats_loss=0.007909, ecapa_loss=0.0001862, whisper_loss=0.09085, over 16434.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.0902, over 3759295.80 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:07:59,366 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 26 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 21:08:04,095 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 21:08:10,596 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05975715443491936, model_norm_threshold=49.35716247558594 2024-08-19 21:08:10,753 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.671e+04, grad_sumsq=1.079e+05, orig_rms_sq=6.184e-01 2024-08-19 21:08:14,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4534590.0, ans=0.2 2024-08-19 21:08:38,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4534690.0, ans=0.125 2024-08-19 21:09:03,742 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8900, loss[loss=0.1193, beats_loss=0.009692, ecapa_loss=0.0001451, whisper_loss=0.1081, over 22651.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.09015, over 3748828.00 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:09:05,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4534890.0, ans=0.125 2024-08-19 21:09:07,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4534890.0, ans=0.0 2024-08-19 21:09:18,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4534990.0, ans=0.0 2024-08-19 21:09:21,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4534990.0, ans=0.1 2024-08-19 21:09:30,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4534990.0, ans=0.125 2024-08-19 21:09:58,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4535190.0, ans=0.2 2024-08-19 21:10:05,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4535290.0, ans=0.1 2024-08-19 21:10:10,155 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 39 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 21:10:13,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4535290.0, ans=0.125 2024-08-19 21:10:14,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.373e+01 2.651e+01 2.938e+01 8.260e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 21:10:22,801 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 8950, loss[loss=0.1048, beats_loss=0.01016, ecapa_loss=0.0001645, whisper_loss=0.09295, over 21435.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.08996, over 3756952.82 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:10:32,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4535390.0, ans=0.125 2024-08-19 21:10:32,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4535390.0, ans=0.0 2024-08-19 21:10:47,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4535490.0, ans=0.2 2024-08-19 21:10:53,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4535490.0, ans=0.0 2024-08-19 21:10:54,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4535490.0, ans=0.125 2024-08-19 21:11:01,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4535590.0, ans=0.0 2024-08-19 21:11:04,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4535590.0, ans=0.2 2024-08-19 21:11:27,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4535690.0, ans=0.0 2024-08-19 21:11:32,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4535790.0, ans=0.0 2024-08-19 21:11:49,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4535890.0, ans=0.5 2024-08-19 21:11:50,876 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9000, loss[loss=0.1113, beats_loss=0.01058, ecapa_loss=0.0001439, whisper_loss=0.09927, over 19069.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.08993, over 3806152.44 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:11:50,876 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-19 21:12:26,936 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005115, whisper_loss=0.248, over 931116.00 frames. 2024-08-19 21:12:49,561 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on SV_voxceleb1: loss=0.003978, beats_loss=0, ecapa_loss=0.0003978, whisper_loss=0, over 944235.00 frames. 2024-08-19 21:14:27,781 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 21:14:27,785 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-19 21:14:31,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-19 21:15:03,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4535990.0, ans=0.125 2024-08-19 21:15:26,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4536090.0, ans=0.125 2024-08-19 21:15:31,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4536190.0, ans=0.1 2024-08-19 21:16:01,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.332e+01 2.634e+01 2.859e+01 8.780e+01, threshold=5.268e+01, percent-clipped=1.0 2024-08-19 21:16:11,959 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9050, loss[loss=0.1092, beats_loss=0.01042, ecapa_loss=0.0001353, whisper_loss=0.09738, over 18570.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001398, whisper_loss=0.08992, over 3792477.63 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:16:46,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4536490.0, ans=0.125 2024-08-19 21:16:52,226 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07672171294689178, model_norm_threshold=52.675148010253906 2024-08-19 21:16:52,383 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.217e+05, grad_sumsq=1.217e+05, orig_rms_sq=1.000e+00 2024-08-19 21:16:54,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4536590.0, ans=0.125 2024-08-19 21:16:56,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4536590.0, ans=0.125 2024-08-19 21:17:06,561 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 18 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 21:17:08,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4536590.0, ans=0.0 2024-08-19 21:17:14,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4536690.0, ans=0.125 2024-08-19 21:17:26,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4536690.0, ans=0.0 2024-08-19 21:17:44,478 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:17:52,592 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9100, loss[loss=0.09338, beats_loss=0.01202, ecapa_loss=0.0001423, whisper_loss=0.07994, over 18535.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01059, ecapa_loss=0.0001391, whisper_loss=0.08943, over 3805242.31 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:17:54,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4536890.0, ans=0.1 2024-08-19 21:18:05,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4536890.0, ans=0.1 2024-08-19 21:18:26,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4536990.0, ans=0.2 2024-08-19 21:18:52,120 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 20 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 21:19:00,017 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-19 21:19:00,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4537190.0, ans=0.2 2024-08-19 21:19:20,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4537290.0, ans=0.1 2024-08-19 21:19:20,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4537290.0, ans=0.0 2024-08-19 21:19:24,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.187e+01 2.511e+01 2.857e+01 6.866e+02, threshold=5.022e+01, percent-clipped=2.0 2024-08-19 21:19:32,330 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 21:19:34,240 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9150, loss[loss=0.08197, beats_loss=0.01415, ecapa_loss=0.000121, whisper_loss=0.06661, over 20018.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01063, ecapa_loss=0.0001384, whisper_loss=0.08874, over 3791631.90 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:19:40,851 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 21:19:47,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4537390.0, ans=0.1 2024-08-19 21:19:56,571 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 21:20:31,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4537690.0, ans=0.1 2024-08-19 21:20:58,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4537790.0, ans=0.0 2024-08-19 21:21:04,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4537790.0, ans=0.125 2024-08-19 21:21:09,824 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9200, loss[loss=0.0885, beats_loss=0.01061, ecapa_loss=0.0001606, whisper_loss=0.07629, over 17415.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001396, whisper_loss=0.08858, over 3795236.46 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:21:26,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.24 vs. limit=10.0 2024-08-19 21:21:28,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-19 21:21:32,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4537990.0, ans=0.125 2024-08-19 21:21:50,388 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 15 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 21:22:27,571 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:22:29,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4538290.0, ans=0.07 2024-08-19 21:22:39,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.332e+01 2.620e+01 2.969e+01 6.711e+01, threshold=5.240e+01, percent-clipped=2.0 2024-08-19 21:22:46,288 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 25 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 21:22:49,871 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9250, loss[loss=0.0974, beats_loss=0.01128, ecapa_loss=0.0001296, whisper_loss=0.08483, over 17519.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.000141, whisper_loss=0.08907, over 3778474.51 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:22:54,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-19 21:23:58,686 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 21:24:01,091 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 34 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 21:24:02,815 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 20 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 21:24:14,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4538790.0, ans=0.0 2024-08-19 21:24:14,710 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-19 21:24:18,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4538790.0, ans=0.5 2024-08-19 21:24:25,338 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9300, loss[loss=0.09632, beats_loss=0.009698, ecapa_loss=0.0001622, whisper_loss=0.085, over 22397.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001411, whisper_loss=0.08899, over 3786679.79 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:24:27,382 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 21:24:36,983 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.714e+01 2024-08-19 21:24:40,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4538890.0, ans=0.0 2024-08-19 21:24:43,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4538990.0, ans=0.125 2024-08-19 21:24:52,532 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-08-19 21:24:54,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4538990.0, ans=0.1 2024-08-19 21:24:57,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4538990.0, ans=15.0 2024-08-19 21:25:07,152 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 21:25:41,689 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 21:25:45,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4539290.0, ans=0.5 2024-08-19 21:25:47,613 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 21 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 21:25:50,822 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.320e+01 2.550e+01 2.886e+01 6.142e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-19 21:25:53,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-19 21:26:00,071 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9350, loss[loss=0.1179, beats_loss=0.0113, ecapa_loss=0.0001725, whisper_loss=0.1049, over 21796.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01061, ecapa_loss=0.00014, whisper_loss=0.089, over 3821381.51 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:26:36,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4539590.0, ans=0.125 2024-08-19 21:27:33,476 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9400, loss[loss=0.1076, beats_loss=0.007648, ecapa_loss=0.0001607, whisper_loss=0.09836, over 20909.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001408, whisper_loss=0.0892, over 3817262.20 frames. ], batch size: 85, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:27:36,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4539890.0, ans=0.125 2024-08-19 21:27:49,158 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 17 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-19 21:27:59,026 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-19 21:28:15,762 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 21:28:23,763 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:28:40,371 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 21:28:42,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4540190.0, ans=0.0 2024-08-19 21:28:56,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2024-08-19 21:28:57,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.227e+01 2.494e+01 2.747e+01 6.860e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-19 21:28:57,384 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 32 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-19 21:29:07,223 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9450, loss[loss=0.08038, beats_loss=0.0119, ecapa_loss=0.0001139, whisper_loss=0.06734, over 12897.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01058, ecapa_loss=0.0001406, whisper_loss=0.08841, over 3801992.10 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:29:14,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4540390.0, ans=0.125 2024-08-19 21:29:18,616 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-08-19 21:29:47,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-19 21:29:57,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4540590.0, ans=0.2 2024-08-19 21:30:05,137 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 21:30:14,699 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 21:30:21,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4540690.0, ans=0.2 2024-08-19 21:30:34,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4540790.0, ans=0.0 2024-08-19 21:30:48,107 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9500, loss[loss=0.1043, beats_loss=0.01057, ecapa_loss=0.0001387, whisper_loss=0.09239, over 21939.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01059, ecapa_loss=0.0001392, whisper_loss=0.08884, over 3787206.61 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:30:50,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4540890.0, ans=0.1 2024-08-19 21:30:53,638 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 16 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-19 21:31:02,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4540890.0, ans=0.0 2024-08-19 21:31:20,163 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 21:31:50,270 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-19 21:31:54,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4541190.0, ans=0.09899494936611666 2024-08-19 21:31:58,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4541190.0, ans=0.0 2024-08-19 21:32:00,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4541190.0, ans=0.125 2024-08-19 21:32:04,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4541190.0, ans=15.0 2024-08-19 21:32:17,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.220e+01 2.482e+01 2.741e+01 4.040e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 21:32:22,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4541290.0, ans=0.125 2024-08-19 21:32:27,363 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9550, loss[loss=0.1027, beats_loss=0.01176, ecapa_loss=0.0001453, whisper_loss=0.08944, over 21434.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01057, ecapa_loss=0.0001393, whisper_loss=0.08859, over 3772489.64 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:32:35,507 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 21:32:54,101 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2024-08-19 21:33:06,645 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 21:33:14,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4541590.0, ans=0.2 2024-08-19 21:33:35,119 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 16 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 21:33:39,015 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 21:33:43,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4541790.0, ans=0.2 2024-08-19 21:33:52,070 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 21:34:03,024 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9600, loss[loss=0.1003, beats_loss=0.01115, ecapa_loss=0.000152, whisper_loss=0.08764, over 20983.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001405, whisper_loss=0.08866, over 3819629.04 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:34:15,205 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-19 21:34:35,048 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 21:35:06,511 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 21:35:08,755 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 21:35:13,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4542190.0, ans=0.0 2024-08-19 21:35:23,189 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 39 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 21:35:26,682 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 21:35:34,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.509e+01 2.746e+01 4.719e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-19 21:35:45,882 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9650, loss[loss=0.09116, beats_loss=0.01208, ecapa_loss=0.0001739, whisper_loss=0.07734, over 19924.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01049, ecapa_loss=0.000142, whisper_loss=0.08881, over 3795204.92 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:35:51,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4542390.0, ans=0.125 2024-08-19 21:36:34,189 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.277e+00 2024-08-19 21:36:36,133 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 21:36:49,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2024-08-19 21:37:14,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4542790.0, ans=0.1 2024-08-19 21:37:29,853 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9700, loss[loss=0.0785, beats_loss=0.011, ecapa_loss=0.0001187, whisper_loss=0.06632, over 16405.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001421, whisper_loss=0.08914, over 3816927.11 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:37:32,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4542890.0, ans=0.125 2024-08-19 21:37:52,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4542990.0, ans=0.125 2024-08-19 21:38:23,733 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 21:38:51,024 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 21:39:02,744 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.331e+01 2.511e+01 2.867e+01 4.334e+02, threshold=5.023e+01, percent-clipped=2.0 2024-08-19 21:39:07,240 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 21:39:12,105 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9750, loss[loss=0.09444, beats_loss=0.01049, ecapa_loss=0.000133, whisper_loss=0.08262, over 15681.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.08956, over 3806571.33 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:39:50,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2024-08-19 21:39:53,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4543590.0, ans=0.125 2024-08-19 21:40:35,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4543790.0, ans=0.125 2024-08-19 21:40:47,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4543890.0, ans=0.125 2024-08-19 21:40:47,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=12.0 2024-08-19 21:40:48,535 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9800, loss[loss=0.1123, beats_loss=0.009971, ecapa_loss=0.0001434, whisper_loss=0.1009, over 16605.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.08951, over 3789463.92 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:41:24,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2024-08-19 21:41:39,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4544090.0, ans=0.125 2024-08-19 21:41:40,986 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 21:41:52,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4544190.0, ans=0.2 2024-08-19 21:42:04,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=12.0 2024-08-19 21:42:16,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.232e+01 2.541e+01 2.711e+01 1.410e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 21:42:26,649 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9850, loss[loss=0.1122, beats_loss=0.01121, ecapa_loss=0.0001332, whisper_loss=0.09964, over 22076.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001398, whisper_loss=0.08909, over 3779414.02 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:42:41,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.08 vs. limit=10.0 2024-08-19 21:43:13,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4544590.0, ans=0.1 2024-08-19 21:43:18,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4544690.0, ans=0.0 2024-08-19 21:43:18,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4544690.0, ans=0.125 2024-08-19 21:43:32,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4544690.0, ans=0.1 2024-08-19 21:43:57,278 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9900, loss[loss=0.07449, beats_loss=0.01264, ecapa_loss=0.0001131, whisper_loss=0.06072, over 14985.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01056, ecapa_loss=0.000141, whisper_loss=0.08887, over 3771606.44 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:44:01,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4544890.0, ans=0.125 2024-08-19 21:44:02,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-19 21:44:04,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4544890.0, ans=0.125 2024-08-19 21:44:06,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4544890.0, ans=0.0 2024-08-19 21:44:17,752 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4544990.0, ans=0.1 2024-08-19 21:44:53,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4545190.0, ans=0.1 2024-08-19 21:44:54,633 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 21:45:05,751 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 21:45:17,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.278e+01 2.511e+01 2.738e+01 3.827e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 21:45:25,780 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 9950, loss[loss=0.09868, beats_loss=0.009142, ecapa_loss=0.0001183, whisper_loss=0.08835, over 16639.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01056, ecapa_loss=0.0001409, whisper_loss=0.08845, over 3757425.47 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:45:29,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4545390.0, ans=0.125 2024-08-19 21:45:44,633 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 21:45:45,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4545490.0, ans=0.0 2024-08-19 21:45:55,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4545490.0, ans=0.125 2024-08-19 21:46:00,894 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 21:46:16,755 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 35 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 21:46:39,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4545790.0, ans=0.0 2024-08-19 21:46:39,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4545790.0, ans=0.0 2024-08-19 21:46:56,078 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10000, loss[loss=0.1165, beats_loss=0.0087, ecapa_loss=0.0001421, whisper_loss=0.1063, over 15948.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.0001415, whisper_loss=0.08865, over 3770366.63 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:46:57,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-19 21:46:59,892 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 28 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 21:47:18,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4545990.0, ans=0.0 2024-08-19 21:47:26,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4545990.0, ans=0.2 2024-08-19 21:48:17,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.220e+01 2.388e+01 2.607e+01 4.207e+01, threshold=4.776e+01, percent-clipped=0.0 2024-08-19 21:48:24,022 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 21:48:26,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4546390.0, ans=0.2 2024-08-19 21:48:26,803 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10050, loss[loss=0.1105, beats_loss=0.01035, ecapa_loss=0.0001408, whisper_loss=0.09878, over 22529.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001416, whisper_loss=0.08929, over 3784435.15 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:48:30,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4546390.0, ans=0.0 2024-08-19 21:48:35,067 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 21:48:47,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4546490.0, ans=0.125 2024-08-19 21:49:27,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4546690.0, ans=0.1 2024-08-19 21:49:35,979 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 21:49:57,066 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10100, loss[loss=0.1096, beats_loss=0.01083, ecapa_loss=0.0001611, whisper_loss=0.09712, over 21603.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001405, whisper_loss=0.08938, over 3825627.34 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:50:01,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4546890.0, ans=0.0 2024-08-19 21:50:04,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4546890.0, ans=0.1 2024-08-19 21:50:47,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4547090.0, ans=0.0 2024-08-19 21:50:59,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2024-08-19 21:51:21,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.291e+01 2.540e+01 2.770e+01 3.592e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-19 21:51:26,995 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-19 21:51:30,715 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10150, loss[loss=0.1074, beats_loss=0.008869, ecapa_loss=0.0001628, whisper_loss=0.09692, over 21130.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.000141, whisper_loss=0.08978, over 3815692.22 frames. ], batch size: 85, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:51:40,416 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-19 21:51:40,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-19 21:52:42,511 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 21:53:04,950 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10200, loss[loss=0.07831, beats_loss=0.01021, ecapa_loss=0.000127, whisper_loss=0.06683, over 14352.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001415, whisper_loss=0.09023, over 3834540.57 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:53:12,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=22.5 2024-08-19 21:53:15,344 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 21:53:18,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4547890.0, ans=0.125 2024-08-19 21:53:19,670 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 21:53:23,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4547990.0, ans=0.0 2024-08-19 21:53:33,862 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 21:53:51,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=10.0 2024-08-19 21:54:05,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4548190.0, ans=0.125 2024-08-19 21:54:09,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4548190.0, ans=0.125 2024-08-19 21:54:13,000 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 21:54:14,626 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 16 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-19 21:54:15,095 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-19 21:54:18,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4548190.0, ans=0.0 2024-08-19 21:54:24,401 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 21:54:26,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4548290.0, ans=0.0 2024-08-19 21:54:32,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4548290.0, ans=0.1 2024-08-19 21:54:33,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.328e+01 2.553e+01 2.832e+01 3.795e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-19 21:54:37,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4548290.0, ans=0.95 2024-08-19 21:54:43,481 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10250, loss[loss=0.08177, beats_loss=0.01203, ecapa_loss=0.0001476, whisper_loss=0.06826, over 22671.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001433, whisper_loss=0.09067, over 3829091.69 frames. ], batch size: 95, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:54:57,360 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-19 21:55:08,028 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 21:55:08,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4548490.0, ans=0.5 2024-08-19 21:55:25,911 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 21:55:45,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4548690.0, ans=0.0 2024-08-19 21:56:05,954 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 21:56:14,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4548790.0, ans=0.0 2024-08-19 21:56:21,614 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10300, loss[loss=0.1078, beats_loss=0.009154, ecapa_loss=0.0001502, whisper_loss=0.09717, over 20015.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001433, whisper_loss=0.09074, over 3843174.45 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:56:22,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4548890.0, ans=0.0 2024-08-19 21:56:22,697 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.47 vs. limit=22.5 2024-08-19 21:56:25,835 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 35 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-19 21:56:41,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-19 21:56:45,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4548990.0, ans=0.125 2024-08-19 21:57:00,039 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 21 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-19 21:57:23,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4549190.0, ans=0.1 2024-08-19 21:57:47,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-19 21:57:50,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.288e+01 2.512e+01 2.813e+01 4.612e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-19 21:57:59,992 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10350, loss[loss=0.09423, beats_loss=0.01223, ecapa_loss=0.0001193, whisper_loss=0.0808, over 21715.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.09127, over 3837631.29 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:58:29,924 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 21:58:42,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-19 21:59:09,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4549690.0, ans=0.2 2024-08-19 21:59:14,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4549690.0, ans=0.0 2024-08-19 21:59:35,873 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 19 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-19 21:59:39,690 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10400, loss[loss=0.07405, beats_loss=0.01307, ecapa_loss=0.0001722, whisper_loss=0.05926, over 11755.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001402, whisper_loss=0.0902, over 3825332.14 frames. ], batch size: 50, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:59:48,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4549890.0, ans=0.0 2024-08-19 21:59:49,804 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09229938685894012, model_norm_threshold=50.23906326293945 2024-08-19 21:59:49,959 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.110e+06, orig_rms_sq=1.059e-02 2024-08-19 22:00:06,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-08-19 22:01:00,121 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 22:01:06,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4550290.0, ans=0.1 2024-08-19 22:01:14,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.360e+01 2.658e+01 2.930e+01 5.443e+02, threshold=5.316e+01, percent-clipped=2.0 2024-08-19 22:01:23,791 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10450, loss[loss=0.1106, beats_loss=0.008105, ecapa_loss=0.0001525, whisper_loss=0.101, over 18737.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001406, whisper_loss=0.0905, over 3826638.42 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:02:01,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4550490.0, ans=0.125 2024-08-19 22:02:06,992 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:02:07,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4550590.0, ans=0.125 2024-08-19 22:02:16,359 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 22:02:21,554 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-19 22:02:41,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4550690.0, ans=0.125 2024-08-19 22:02:43,628 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 22:02:48,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4550790.0, ans=0.2 2024-08-19 22:03:00,774 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 22:03:09,019 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10500, loss[loss=0.101, beats_loss=0.008749, ecapa_loss=0.0001656, whisper_loss=0.09062, over 22364.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.09057, over 3830410.63 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:03:36,791 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 22:03:50,365 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 22:04:04,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4551090.0, ans=0.025 2024-08-19 22:04:04,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-19 22:04:06,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4551090.0, ans=0.125 2024-08-19 22:04:08,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4551190.0, ans=0.1 2024-08-19 22:04:21,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4551190.0, ans=0.125 2024-08-19 22:04:43,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.385e+01 2.732e+01 3.050e+01 1.536e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-19 22:04:55,217 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10550, loss[loss=0.09716, beats_loss=0.01018, ecapa_loss=0.0001502, whisper_loss=0.08548, over 21352.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001411, whisper_loss=0.09022, over 3841088.17 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:05:01,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2024-08-19 22:05:10,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4551390.0, ans=0.0 2024-08-19 22:05:17,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4551490.0, ans=0.125 2024-08-19 22:05:25,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-19 22:06:16,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4551690.0, ans=0.1 2024-08-19 22:06:28,237 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 22:06:33,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-19 22:06:40,436 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 22:06:42,523 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 22:06:46,834 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10600, loss[loss=0.1095, beats_loss=0.008237, ecapa_loss=0.0001132, whisper_loss=0.1001, over 15339.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001407, whisper_loss=0.08962, over 3820010.82 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:07:28,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4552090.0, ans=0.0 2024-08-19 22:07:45,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4552090.0, ans=0.125 2024-08-19 22:07:52,615 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 22:08:00,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-19 22:08:07,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4552190.0, ans=0.0 2024-08-19 22:08:14,614 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2024-08-19 22:08:18,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4552290.0, ans=0.125 2024-08-19 22:08:20,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4552290.0, ans=0.0 2024-08-19 22:08:21,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.245e+01 2.495e+01 2.796e+01 7.063e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-19 22:08:22,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4552290.0, ans=0.2 2024-08-19 22:08:32,209 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10650, loss[loss=0.0956, beats_loss=0.01205, ecapa_loss=0.0001427, whisper_loss=0.08212, over 21727.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001404, whisper_loss=0.08915, over 3810338.44 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:08:47,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4552390.0, ans=0.2 2024-08-19 22:08:55,387 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-19 22:09:22,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4552590.0, ans=0.2 2024-08-19 22:09:48,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4552690.0, ans=0.0 2024-08-19 22:10:05,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4552790.0, ans=0.125 2024-08-19 22:10:11,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4552790.0, ans=0.0 2024-08-19 22:10:15,043 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10700, loss[loss=0.1048, beats_loss=0.01021, ecapa_loss=0.0001314, whisper_loss=0.09329, over 13683.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01053, ecapa_loss=0.0001403, whisper_loss=0.08869, over 3770169.60 frames. ], batch size: 52, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:10:18,735 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 22:10:54,310 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 22:11:05,870 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 22:11:10,303 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 13 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 22:11:48,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4553290.0, ans=0.2 2024-08-19 22:11:54,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.310e+01 2.569e+01 2.819e+01 4.934e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 22:12:03,572 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10750, loss[loss=0.1086, beats_loss=0.008459, ecapa_loss=0.000158, whisper_loss=0.09856, over 14907.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.08862, over 3795888.96 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:12:04,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4553390.0, ans=0.0 2024-08-19 22:12:07,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-19 22:12:11,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4553390.0, ans=0.1 2024-08-19 22:12:11,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4553390.0, ans=0.125 2024-08-19 22:12:42,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4553490.0, ans=0.1 2024-08-19 22:12:49,852 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 15 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 22:13:17,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4553690.0, ans=0.1 2024-08-19 22:13:33,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4553790.0, ans=0.1 2024-08-19 22:13:43,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-19 22:13:44,352 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10800, loss[loss=0.09037, beats_loss=0.009817, ecapa_loss=0.0001445, whisper_loss=0.0791, over 16260.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01051, ecapa_loss=0.0001394, whisper_loss=0.08865, over 3806280.62 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:13:54,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-19 22:14:12,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-08-19 22:14:27,416 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 22:15:08,338 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-19 22:15:16,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-08-19 22:15:16,913 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.322e+01 2.506e+01 2.895e+01 8.208e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-19 22:15:26,480 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10850, loss[loss=0.1348, beats_loss=0.007062, ecapa_loss=0.0001391, whisper_loss=0.1264, over 14536.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08899, over 3769125.04 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:15:26,712 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 22:15:29,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4554390.0, ans=0.125 2024-08-19 22:15:46,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4554490.0, ans=0.125 2024-08-19 22:15:59,825 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 22:16:07,216 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 22:16:07,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4554590.0, ans=0.2 2024-08-19 22:16:28,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4554690.0, ans=0.125 2024-08-19 22:16:37,097 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-19 22:16:48,893 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 22:16:53,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4554790.0, ans=0.0 2024-08-19 22:17:02,315 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 18 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 22:17:11,308 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10900, loss[loss=0.1236, beats_loss=0.008794, ecapa_loss=0.0001291, whisper_loss=0.1135, over 22788.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001393, whisper_loss=0.08957, over 3782949.54 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:17:13,051 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 22:17:23,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4554890.0, ans=0.125 2024-08-19 22:17:34,109 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 10 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 22:17:49,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4554990.0, ans=0.09899494936611666 2024-08-19 22:17:49,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4554990.0, ans=0.125 2024-08-19 22:18:24,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4555190.0, ans=0.125 2024-08-19 22:18:40,038 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 22:18:44,130 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 22:18:51,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.334e+01 2.577e+01 2.862e+01 5.392e+01, threshold=5.154e+01, percent-clipped=2.0 2024-08-19 22:19:03,222 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 10950, loss[loss=0.1016, beats_loss=0.009597, ecapa_loss=0.0001655, whisper_loss=0.09039, over 19999.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001392, whisper_loss=0.08944, over 3815088.42 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:19:03,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4555390.0, ans=0.0 2024-08-19 22:19:05,883 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-19 22:19:25,616 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 22:19:30,255 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:20:10,037 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-08-19 22:20:17,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4555690.0, ans=0.1 2024-08-19 22:20:40,007 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2024-08-19 22:20:46,103 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-08-19 22:20:57,495 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11000, loss[loss=0.08231, beats_loss=0.01317, ecapa_loss=0.0001103, whisper_loss=0.06804, over 14758.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.09005, over 3813474.21 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:20:57,699 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 22:21:11,733 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 22:21:32,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-19 22:21:39,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4555990.0, ans=0.125 2024-08-19 22:21:45,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4556090.0, ans=0.0 2024-08-19 22:22:12,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-19 22:22:21,274 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 22:22:28,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4556290.0, ans=0.1 2024-08-19 22:22:40,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.290e+01 2.533e+01 2.932e+01 4.213e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 22:22:51,777 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11050, loss[loss=0.09859, beats_loss=0.01046, ecapa_loss=0.000135, whisper_loss=0.08678, over 22004.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001397, whisper_loss=0.0901, over 3829151.11 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:23:16,824 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 22:23:32,000 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 22:24:35,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4556790.0, ans=0.125 2024-08-19 22:24:39,366 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 8 from Vox, 39 fro AS 2024-08-19 22:24:45,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4556890.0, ans=0.125 2024-08-19 22:24:45,935 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11100, loss[loss=0.1066, beats_loss=0.009765, ecapa_loss=0.000161, whisper_loss=0.09527, over 20620.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001379, whisper_loss=0.09052, over 3846794.13 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:24:52,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4556890.0, ans=15.0 2024-08-19 22:24:57,007 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 22:25:07,432 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:25:10,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4556990.0, ans=0.125 2024-08-19 22:25:45,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4557090.0, ans=0.05 2024-08-19 22:26:10,388 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 22:26:14,715 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:26:22,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4557290.0, ans=0.1 2024-08-19 22:26:30,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.214e+01 2.398e+01 2.634e+01 3.893e+01, threshold=4.795e+01, percent-clipped=0.0 2024-08-19 22:26:41,234 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11150, loss[loss=0.09535, beats_loss=0.009498, ecapa_loss=0.0001178, whisper_loss=0.08467, over 14434.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001383, whisper_loss=0.09074, over 3885960.62 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:26:52,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4557390.0, ans=0.1 2024-08-19 22:27:08,741 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 22:27:20,651 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.992e+00 2024-08-19 22:27:40,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4557590.0, ans=0.125 2024-08-19 22:27:55,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4557690.0, ans=0.125 2024-08-19 22:28:10,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4557690.0, ans=0.1 2024-08-19 22:28:34,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4557790.0, ans=0.125 2024-08-19 22:28:39,602 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11200, loss[loss=0.1148, beats_loss=0.008469, ecapa_loss=0.0001441, whisper_loss=0.1049, over 14336.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.0913, over 3884542.96 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:28:45,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4557890.0, ans=0.125 2024-08-19 22:28:52,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4557890.0, ans=0.125 2024-08-19 22:29:20,161 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 22:29:35,759 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 22:29:51,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4558090.0, ans=0.125 2024-08-19 22:30:09,309 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 22:30:10,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-08-19 22:30:12,303 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 22:30:34,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.356e+01 2.585e+01 2.931e+01 9.399e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 22:30:47,228 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11250, loss[loss=0.09404, beats_loss=0.01134, ecapa_loss=0.0001491, whisper_loss=0.08121, over 17159.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.09119, over 3886144.29 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:31:10,377 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 22:31:25,736 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 22:31:26,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4558490.0, ans=0.0 2024-08-19 22:32:14,053 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 22:32:29,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4558790.0, ans=0.0 2024-08-19 22:32:46,935 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11300, loss[loss=0.09168, beats_loss=0.01057, ecapa_loss=0.0001275, whisper_loss=0.07983, over 19791.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.09117, over 3860417.99 frames. ], batch size: 76, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:33:26,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2024-08-19 22:33:27,778 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 22 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-19 22:33:33,784 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 22:33:34,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4559090.0, ans=0.125 2024-08-19 22:33:51,133 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 22:34:03,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4559190.0, ans=0.2 2024-08-19 22:34:31,779 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 22:34:36,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.198e+01 2.361e+01 2.678e+01 4.685e+01, threshold=4.723e+01, percent-clipped=0.0 2024-08-19 22:34:48,376 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11350, loss[loss=0.09988, beats_loss=0.01136, ecapa_loss=0.0001226, whisper_loss=0.08729, over 19430.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.09092, over 3859098.00 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:34:54,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-08-19 22:35:23,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4559490.0, ans=0.125 2024-08-19 22:35:38,044 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:35:49,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4559590.0, ans=0.0 2024-08-19 22:36:01,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4559690.0, ans=0.125 2024-08-19 22:36:48,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-19 22:36:51,792 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11400, loss[loss=0.09975, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.08752, over 12926.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.09076, over 3828240.85 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:37:23,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4559990.0, ans=0.0 2024-08-19 22:37:33,095 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 22:38:06,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4560090.0, ans=0.125 2024-08-19 22:38:44,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.308e+01 2.574e+01 2.916e+01 2.255e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-19 22:38:55,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4560390.0, ans=0.0 2024-08-19 22:38:56,445 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11450, loss[loss=0.107, beats_loss=0.007816, ecapa_loss=0.0001254, whisper_loss=0.09795, over 13822.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09171, over 3876402.40 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:39:00,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4560390.0, ans=0.125 2024-08-19 22:39:40,052 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-19 22:40:00,318 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-19 22:40:12,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4560690.0, ans=0.125 2024-08-19 22:40:17,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4560690.0, ans=0.125 2024-08-19 22:40:21,234 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 22:40:33,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-08-19 22:40:38,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=15.0 2024-08-19 22:40:55,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4560890.0, ans=0.125 2024-08-19 22:40:55,968 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11500, loss[loss=0.08688, beats_loss=0.01308, ecapa_loss=0.0001285, whisper_loss=0.07251, over 22916.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01021, ecapa_loss=0.0001423, whisper_loss=0.0915, over 3857321.05 frames. ], batch size: 94, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:41:09,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4560890.0, ans=0.1 2024-08-19 22:41:37,480 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 22:41:43,971 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 22:42:07,178 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 22:42:29,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4561290.0, ans=0.125 2024-08-19 22:42:39,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.481e+01 2.784e+01 2.057e+02, threshold=4.962e+01, percent-clipped=3.0 2024-08-19 22:42:48,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4561290.0, ans=0.125 2024-08-19 22:42:51,361 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11550, loss[loss=0.084, beats_loss=0.01214, ecapa_loss=0.0001308, whisper_loss=0.07056, over 15757.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01021, ecapa_loss=0.0001418, whisper_loss=0.09125, over 3834761.08 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:43:19,029 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-19 22:43:33,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-19 22:43:33,461 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-08-19 22:43:53,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4561590.0, ans=0.2 2024-08-19 22:43:54,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4561590.0, ans=0.125 2024-08-19 22:44:12,967 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 40 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 22:44:14,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4561690.0, ans=0.125 2024-08-19 22:44:18,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-19 22:44:30,732 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 22:44:43,862 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11600, loss[loss=0.113, beats_loss=0.00963, ecapa_loss=0.0001393, whisper_loss=0.102, over 21895.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01025, ecapa_loss=0.0001411, whisper_loss=0.09161, over 3885122.87 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:44:56,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4561890.0, ans=0.1 2024-08-19 22:45:15,781 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 22:45:31,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4562090.0, ans=0.0 2024-08-19 22:45:36,412 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 13 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 22:45:43,618 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 22:45:43,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4562190.0, ans=0.0 2024-08-19 22:45:47,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4562190.0, ans=0.0 2024-08-19 22:45:53,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4562190.0, ans=0.0 2024-08-19 22:45:55,071 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=22.5 2024-08-19 22:46:10,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.615e+01 2.277e+01 2.484e+01 2.833e+01 4.332e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 22:46:19,760 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11650, loss[loss=0.09046, beats_loss=0.01096, ecapa_loss=0.0001557, whisper_loss=0.07795, over 18556.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0103, ecapa_loss=0.0001414, whisper_loss=0.09129, over 3868472.70 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:46:22,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4562390.0, ans=0.125 2024-08-19 22:46:27,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4562390.0, ans=0.2 2024-08-19 22:46:32,592 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 22:46:35,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-19 22:46:47,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4562490.0, ans=0.0 2024-08-19 22:46:51,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4562490.0, ans=0.125 2024-08-19 22:46:51,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4562490.0, ans=0.0 2024-08-19 22:47:23,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4562690.0, ans=0.125 2024-08-19 22:47:26,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4562690.0, ans=0.125 2024-08-19 22:47:41,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4562790.0, ans=0.04949747468305833 2024-08-19 22:47:43,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-08-19 22:47:48,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4562790.0, ans=0.125 2024-08-19 22:47:54,227 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-19 22:48:01,565 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11700, loss[loss=0.1031, beats_loss=0.01028, ecapa_loss=0.0001488, whisper_loss=0.09137, over 21940.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001414, whisper_loss=0.09065, over 3850820.15 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:48:08,185 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 22:48:19,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4562890.0, ans=0.1 2024-08-19 22:48:36,654 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-19 22:48:48,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-19 22:48:50,994 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-08-19 22:49:07,738 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 22:49:08,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4563190.0, ans=0.2 2024-08-19 22:49:12,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4563190.0, ans=0.0 2024-08-19 22:49:17,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4563190.0, ans=0.125 2024-08-19 22:49:24,223 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 22:49:34,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.355e+01 2.582e+01 2.983e+01 7.922e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-19 22:49:47,372 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11750, loss[loss=0.1321, beats_loss=0.009002, ecapa_loss=0.000159, whisper_loss=0.1215, over 18513.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.09114, over 3883706.53 frames. ], batch size: 70, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-19 22:49:50,852 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 30 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 22:50:06,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=12.0 2024-08-19 22:50:49,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4563590.0, ans=0.0 2024-08-19 22:50:52,406 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-19 22:51:11,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4563690.0, ans=0.025 2024-08-19 22:51:39,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4563790.0, ans=0.0 2024-08-19 22:51:42,168 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11800, loss[loss=0.09566, beats_loss=0.0107, ecapa_loss=0.0001475, whisper_loss=0.08349, over 19595.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01033, ecapa_loss=0.0001412, whisper_loss=0.09103, over 3872508.67 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:52:00,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4563890.0, ans=0.0 2024-08-19 22:52:10,454 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 22:52:33,527 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2024-08-19 22:53:02,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-19 22:53:23,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4564290.0, ans=0.0 2024-08-19 22:53:24,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.230e+01 2.402e+01 2.738e+01 3.572e+01, threshold=4.803e+01, percent-clipped=0.0 2024-08-19 22:53:24,746 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 22:53:31,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4564290.0, ans=0.2 2024-08-19 22:53:31,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2024-08-19 22:53:33,755 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11850, loss[loss=0.1083, beats_loss=0.01073, ecapa_loss=0.0001212, whisper_loss=0.0964, over 17428.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.0909, over 3846955.37 frames. ], batch size: 67, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:53:54,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-19 22:54:25,285 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-19 22:54:37,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4564590.0, ans=0.125 2024-08-19 22:54:45,386 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 22:54:47,763 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 22:54:57,032 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 17 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-19 22:55:11,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2024-08-19 22:55:19,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4564790.0, ans=0.0 2024-08-19 22:55:25,496 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11900, loss[loss=0.104, beats_loss=0.01012, ecapa_loss=0.0001458, whisper_loss=0.09239, over 21774.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.09065, over 3855718.64 frames. ], batch size: 83, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:55:29,772 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 13 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 22:55:30,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4564890.0, ans=0.025 2024-08-19 22:55:41,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4564890.0, ans=0.125 2024-08-19 22:56:15,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4565090.0, ans=0.09899494936611666 2024-08-19 22:56:18,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4565090.0, ans=0.05 2024-08-19 22:56:39,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4565190.0, ans=0.0 2024-08-19 22:56:45,036 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 22:56:49,330 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 22:57:06,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.317e+01 2.609e+01 2.861e+01 6.356e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-19 22:57:15,622 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 11950, loss[loss=0.1087, beats_loss=0.008998, ecapa_loss=0.0001726, whisper_loss=0.09794, over 22206.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001404, whisper_loss=0.09037, over 3845465.37 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:57:16,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4565390.0, ans=0.0 2024-08-19 22:57:31,942 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 15 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 22:57:36,564 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 22:57:39,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4565490.0, ans=0.1 2024-08-19 22:57:46,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4565490.0, ans=0.0 2024-08-19 22:58:14,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4565590.0, ans=0.0 2024-08-19 22:58:21,625 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 22:58:54,042 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 12 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 22:58:58,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2024-08-19 22:58:58,905 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12000, loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.0001367, whisper_loss=0.09205, over 13119.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.09035, over 3832650.46 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:58:58,905 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-19 22:59:32,058 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4929, 4.2943, 3.4968, 3.8257], device='cuda:1') 2024-08-19 22:59:35,973 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005134, whisper_loss=0.2483, over 931116.00 frames. 2024-08-19 23:00:01,491 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on SV_voxceleb1: loss=0.003987, beats_loss=0, ecapa_loss=0.0003987, whisper_loss=0, over 944235.00 frames. 2024-08-19 23:01:39,462 INFO [train_multi_KD3.py:1150] (1/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 23:01:39,466 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-19 23:01:45,231 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 23:01:46,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4565890.0, ans=0.125 2024-08-19 23:01:56,049 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 23:02:10,158 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-19 23:02:11,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4565990.0, ans=0.125 2024-08-19 23:02:11,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4565990.0, ans=0.125 2024-08-19 23:02:19,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.69 vs. limit=6.0 2024-08-19 23:02:48,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4566190.0, ans=0.0 2024-08-19 23:03:17,573 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 23:03:21,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.392e+01 2.640e+01 2.904e+01 4.083e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-19 23:03:31,863 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12050, loss[loss=0.09856, beats_loss=0.0124, ecapa_loss=0.0001244, whisper_loss=0.08492, over 17523.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001399, whisper_loss=0.09016, over 3811496.49 frames. ], batch size: 70, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:03:42,253 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-19 23:03:47,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4566390.0, ans=0.0 2024-08-19 23:04:29,164 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 23:04:39,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4566690.0, ans=0.0 2024-08-19 23:05:00,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4566790.0, ans=0.5 2024-08-19 23:05:11,023 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04405633732676506, model_norm_threshold=52.79664611816406 2024-08-19 23:05:11,178 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.250e+05, grad_sumsq=2.117e+07, orig_rms_sq=1.063e-02 2024-08-19 23:05:19,534 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12100, loss[loss=0.097, beats_loss=0.01187, ecapa_loss=0.0001371, whisper_loss=0.08376, over 22179.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001393, whisper_loss=0.0901, over 3843315.17 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:05:23,316 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 23:05:30,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4566890.0, ans=0.1 2024-08-19 23:05:57,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4566990.0, ans=0.125 2024-08-19 23:06:18,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4567090.0, ans=0.0 2024-08-19 23:06:21,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4567090.0, ans=0.125 2024-08-19 23:06:26,480 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 23:06:38,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4567190.0, ans=0.125 2024-08-19 23:06:41,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4567190.0, ans=0.1 2024-08-19 23:06:42,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4567190.0, ans=0.125 2024-08-19 23:06:44,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4567290.0, ans=0.2 2024-08-19 23:06:57,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.332e+01 2.618e+01 3.087e+01 1.198e+03, threshold=5.236e+01, percent-clipped=2.0 2024-08-19 23:07:01,621 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-19 23:07:06,256 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12150, loss[loss=0.1138, beats_loss=0.009544, ecapa_loss=0.0001312, whisper_loss=0.1029, over 21388.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.09021, over 3844733.79 frames. ], batch size: 82, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:07:20,398 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-19 23:07:39,319 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-19 23:07:44,024 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 23:07:50,879 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 23:08:04,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4567590.0, ans=0.125 2024-08-19 23:08:22,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4567690.0, ans=0.1 2024-08-19 23:08:29,042 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 23:08:37,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4567790.0, ans=0.125 2024-08-19 23:08:40,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4567790.0, ans=0.125 2024-08-19 23:08:48,606 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 23:08:49,857 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12200, loss[loss=0.09411, beats_loss=0.009649, ecapa_loss=0.0001505, whisper_loss=0.08296, over 15066.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001387, whisper_loss=0.09051, over 3794199.01 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:09:01,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4567890.0, ans=0.1 2024-08-19 23:09:03,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4567890.0, ans=0.05 2024-08-19 23:09:25,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4567990.0, ans=0.2 2024-08-19 23:09:31,551 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 12 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 23:09:46,380 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 23:10:02,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-19 23:10:06,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-08-19 23:10:18,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.296e+01 2.498e+01 2.784e+01 3.860e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 23:10:26,696 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12250, loss[loss=0.06998, beats_loss=0.01225, ecapa_loss=0.0001517, whisper_loss=0.05621, over 14492.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001392, whisper_loss=0.08981, over 3770478.95 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:10:29,039 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 23:10:29,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4568390.0, ans=0.0 2024-08-19 23:10:54,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4568490.0, ans=0.2 2024-08-19 23:10:57,686 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 23:11:01,675 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 23:11:07,815 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 23:11:36,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4568690.0, ans=0.1 2024-08-19 23:11:43,702 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 19 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-19 23:11:56,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4568790.0, ans=0.1 2024-08-19 23:12:03,947 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12300, loss[loss=0.09896, beats_loss=0.01074, ecapa_loss=0.0001415, whisper_loss=0.08681, over 19283.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.0893, over 3752556.36 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:12:19,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4568890.0, ans=0.1 2024-08-19 23:12:23,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4568990.0, ans=0.0 2024-08-19 23:12:33,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4568990.0, ans=0.125 2024-08-19 23:13:15,318 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 12 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 23:13:34,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.176e+01 2.435e+01 2.712e+01 4.279e+01, threshold=4.869e+01, percent-clipped=0.0 2024-08-19 23:13:42,436 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12350, loss[loss=0.1, beats_loss=0.01021, ecapa_loss=0.0001588, whisper_loss=0.08824, over 21108.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.00014, whisper_loss=0.08902, over 3766392.74 frames. ], batch size: 85, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:14:16,088 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.06 vs. limit=6.0 2024-08-19 23:14:18,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-19 23:14:24,442 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 23:14:27,805 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2024-08-19 23:14:28,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4569590.0, ans=0.1 2024-08-19 23:14:36,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4569590.0, ans=0.025 2024-08-19 23:14:44,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4569690.0, ans=0.2 2024-08-19 23:14:54,167 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 23:15:22,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4569790.0, ans=0.0 2024-08-19 23:15:23,635 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 23:15:24,710 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12400, loss[loss=0.1116, beats_loss=0.007925, ecapa_loss=0.000149, whisper_loss=0.1021, over 16146.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08831, over 3738077.92 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:15:58,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4569990.0, ans=0.0 2024-08-19 23:16:03,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4569990.0, ans=0.2 2024-08-19 23:16:05,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2024-08-19 23:16:12,474 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:16:14,888 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 23:16:19,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-19 23:16:34,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.04 vs. limit=6.0 2024-08-19 23:16:46,006 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:16:47,161 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 15 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 23:17:00,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.374e+01 2.637e+01 2.909e+01 4.258e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-19 23:17:09,504 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12450, loss[loss=0.08251, beats_loss=0.01059, ecapa_loss=0.0001484, whisper_loss=0.07043, over 12668.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.08871, over 3745726.69 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:17:28,807 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 23:17:41,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4570490.0, ans=0.05 2024-08-19 23:18:10,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4570690.0, ans=0.05 2024-08-19 23:18:14,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4570690.0, ans=0.125 2024-08-19 23:18:39,660 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 23:18:51,833 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12500, loss[loss=0.1081, beats_loss=0.00979, ecapa_loss=0.0001314, whisper_loss=0.09695, over 14224.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001407, whisper_loss=0.08942, over 3748540.16 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:19:00,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4570890.0, ans=0.125 2024-08-19 23:19:06,283 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 23:19:09,912 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 15 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 23:19:17,057 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 23:19:21,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4570990.0, ans=0.0 2024-08-19 23:19:44,134 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 18 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-19 23:19:49,789 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 23:20:13,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4571190.0, ans=0.05 2024-08-19 23:20:32,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4571290.0, ans=15.0 2024-08-19 23:20:34,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2024-08-19 23:20:37,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.234e+01 2.500e+01 2.814e+01 4.349e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 23:20:47,941 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12550, loss[loss=0.08279, beats_loss=0.01037, ecapa_loss=0.0001469, whisper_loss=0.07096, over 17210.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.08895, over 3763783.31 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:20:56,976 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 23:20:58,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4571390.0, ans=0.125 2024-08-19 23:21:06,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4571390.0, ans=0.125 2024-08-19 23:21:29,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2024-08-19 23:21:40,662 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-19 23:22:01,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4571690.0, ans=0.125 2024-08-19 23:22:03,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4571690.0, ans=0.125 2024-08-19 23:22:15,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-19 23:22:32,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4571790.0, ans=0.0 2024-08-19 23:22:37,388 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12600, loss[loss=0.09975, beats_loss=0.01006, ecapa_loss=0.0001149, whisper_loss=0.08854, over 19043.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001419, whisper_loss=0.08956, over 3801929.92 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:23:09,830 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 23:23:23,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4572090.0, ans=0.025 2024-08-19 23:23:41,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4572090.0, ans=0.025 2024-08-19 23:23:53,261 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 23:23:58,488 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 29 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 23:24:26,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4572290.0, ans=0.0 2024-08-19 23:24:32,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.292e+01 2.504e+01 2.662e+01 4.267e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-19 23:24:40,794 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 23:24:43,480 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12650, loss[loss=0.1016, beats_loss=0.01135, ecapa_loss=0.0001513, whisper_loss=0.08876, over 14346.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.09049, over 3815131.02 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:25:05,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4572390.0, ans=0.1 2024-08-19 23:25:19,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4572490.0, ans=0.2 2024-08-19 23:25:20,839 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 23:25:25,540 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 23:25:35,617 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-19 23:25:40,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4572590.0, ans=0.04949747468305833 2024-08-19 23:25:48,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4572590.0, ans=0.125 2024-08-19 23:26:03,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4572690.0, ans=0.125 2024-08-19 23:26:14,848 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 23:26:36,816 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-08-19 23:26:39,133 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12700, loss[loss=0.1099, beats_loss=0.008268, ecapa_loss=0.0001794, whisper_loss=0.0998, over 17687.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001402, whisper_loss=0.09061, over 3816218.15 frames. ], batch size: 71, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:27:27,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4573090.0, ans=0.125 2024-08-19 23:27:32,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4573090.0, ans=0.125 2024-08-19 23:27:48,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4573190.0, ans=0.2 2024-08-19 23:28:13,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4573290.0, ans=0.2 2024-08-19 23:28:25,043 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.31 vs. limit=10.0 2024-08-19 23:28:25,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.359e+01 2.539e+01 2.793e+01 4.602e+02, threshold=5.078e+01, percent-clipped=1.0 2024-08-19 23:28:27,217 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 23:28:34,451 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12750, loss[loss=0.1209, beats_loss=0.01061, ecapa_loss=0.0001219, whisper_loss=0.1091, over 21975.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.09023, over 3810503.35 frames. ], batch size: 86, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:28:51,136 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 23:29:21,631 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 23:29:24,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4573590.0, ans=0.0 2024-08-19 23:29:24,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-19 23:29:45,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4573690.0, ans=0.125 2024-08-19 23:29:45,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4573690.0, ans=0.125 2024-08-19 23:29:54,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4573690.0, ans=0.125 2024-08-19 23:29:59,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4573690.0, ans=0.2 2024-08-19 23:30:18,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4573790.0, ans=0.0 2024-08-19 23:30:33,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4573890.0, ans=0.2 2024-08-19 23:30:34,021 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12800, loss[loss=0.1012, beats_loss=0.01092, ecapa_loss=0.000153, whisper_loss=0.08879, over 21620.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001413, whisper_loss=0.0907, over 3796457.92 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:30:48,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4573890.0, ans=0.125 2024-08-19 23:30:57,005 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 20 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 23:31:40,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=22.5 2024-08-19 23:31:45,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-19 23:32:10,564 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 23:32:12,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4574290.0, ans=0.1 2024-08-19 23:32:27,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.310e+01 2.474e+01 2.739e+01 4.049e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-19 23:32:38,880 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12850, loss[loss=0.0895, beats_loss=0.01315, ecapa_loss=0.0001183, whisper_loss=0.07516, over 21838.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001418, whisper_loss=0.09117, over 3839760.42 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:33:02,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4574490.0, ans=0.125 2024-08-19 23:33:02,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4574490.0, ans=15.0 2024-08-19 23:33:33,678 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 23:33:45,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4574590.0, ans=0.2 2024-08-19 23:33:57,964 INFO [train_multi_KD3.py:845] (1/4) A total of 97 cuts. 26 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-19 23:33:59,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4574690.0, ans=0.1 2024-08-19 23:34:11,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=12.0 2024-08-19 23:34:29,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4574790.0, ans=0.2 2024-08-19 23:34:32,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2024-08-19 23:34:37,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4574790.0, ans=0.125 2024-08-19 23:34:38,101 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 23:34:42,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-19 23:34:43,109 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12900, loss[loss=0.105, beats_loss=0.008745, ecapa_loss=0.0001754, whisper_loss=0.09451, over 20284.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.09045, over 3843222.93 frames. ], batch size: 86, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:34:46,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4574890.0, ans=0.125 2024-08-19 23:35:18,148 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 23:35:53,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4575090.0, ans=0.0 2024-08-19 23:35:55,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4575190.0, ans=0.1 2024-08-19 23:36:34,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.306e+01 2.601e+01 3.029e+01 4.481e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-19 23:36:44,571 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 12950, loss[loss=0.1117, beats_loss=0.009551, ecapa_loss=0.0001611, whisper_loss=0.1006, over 21894.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001414, whisper_loss=0.09077, over 3853445.00 frames. ], batch size: 84, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:36:57,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4575390.0, ans=0.2 2024-08-19 23:37:10,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4575490.0, ans=0.2 2024-08-19 23:37:12,102 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-19 23:37:56,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4575690.0, ans=0.125 2024-08-19 23:38:42,776 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13000, loss[loss=0.09548, beats_loss=0.01082, ecapa_loss=0.0001099, whisper_loss=0.08356, over 20538.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001408, whisper_loss=0.09053, over 3844931.78 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:39:26,016 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 23:39:31,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-19 23:39:56,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4576190.0, ans=0.1 2024-08-19 23:40:04,411 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2024-08-19 23:40:19,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:19,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-19 23:40:21,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:21,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:30,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.284e+01 2.434e+01 2.790e+01 4.214e+01, threshold=4.868e+01, percent-clipped=0.0 2024-08-19 23:40:32,094 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 23:40:38,330 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13050, loss[loss=0.08628, beats_loss=0.01309, ecapa_loss=0.0001485, whisper_loss=0.07171, over 14178.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001413, whisper_loss=0.09011, over 3835397.62 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:40:43,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4576390.0, ans=0.125 2024-08-19 23:41:35,924 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06296969205141068, model_norm_threshold=48.684600830078125 2024-08-19 23:41:36,080 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.824e+04, grad_sumsq=7.824e+04, orig_rms_sq=1.000e+00 2024-08-19 23:41:50,266 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2024-08-19 23:42:22,009 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13100, loss[loss=0.1113, beats_loss=0.0127, ecapa_loss=0.0001031, whisper_loss=0.09759, over 19734.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001418, whisper_loss=0.0896, over 3814135.00 frames. ], batch size: 74, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:42:32,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4576890.0, ans=0.125 2024-08-19 23:42:52,277 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 23:42:55,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4576990.0, ans=0.0 2024-08-19 23:43:06,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4577090.0, ans=0.125 2024-08-19 23:43:11,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4577090.0, ans=0.0 2024-08-19 23:44:02,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.292e+01 2.525e+01 2.909e+01 7.731e+02, threshold=5.050e+01, percent-clipped=3.0 2024-08-19 23:44:10,311 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13150, loss[loss=0.08916, beats_loss=0.01177, ecapa_loss=0.0001215, whisper_loss=0.07618, over 16502.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.000142, whisper_loss=0.08912, over 3794514.92 frames. ], batch size: 66, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:44:17,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4577390.0, ans=0.125 2024-08-19 23:44:42,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-19 23:44:50,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4577590.0, ans=0.125 2024-08-19 23:44:51,807 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 23:44:52,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4577590.0, ans=0.2 2024-08-19 23:45:06,668 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 42 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 23:45:12,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4577690.0, ans=0.125 2024-08-19 23:45:41,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-08-19 23:45:43,171 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13200, loss[loss=0.1408, beats_loss=0.004731, ecapa_loss=0.0001694, whisper_loss=0.1344, over 14863.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001424, whisper_loss=0.08987, over 3797304.34 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:45:49,208 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 23:46:25,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4578090.0, ans=0.0 2024-08-19 23:46:59,188 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-19 23:47:05,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-19 23:47:05,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.290e+01 2.445e+01 2.755e+01 3.843e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-19 23:47:12,576 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13250, loss[loss=0.1055, beats_loss=0.01031, ecapa_loss=0.0001324, whisper_loss=0.09386, over 23246.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001407, whisper_loss=0.08996, over 3796660.86 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:47:34,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4578490.0, ans=0.125 2024-08-19 23:47:58,441 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 23:48:00,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:48:22,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4578690.0, ans=0.1 2024-08-19 23:48:23,460 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 23:48:34,891 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 23:48:46,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4578790.0, ans=15.0 2024-08-19 23:48:51,794 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13300, loss[loss=0.1039, beats_loss=0.008828, ecapa_loss=0.0001263, whisper_loss=0.0938, over 15703.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001406, whisper_loss=0.08966, over 3784611.92 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:49:03,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4578890.0, ans=10.0 2024-08-19 23:49:19,826 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 36 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 23:49:23,847 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 23 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-19 23:49:24,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4578990.0, ans=0.1 2024-08-19 23:49:40,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4579090.0, ans=0.1 2024-08-19 23:50:11,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4579290.0, ans=0.0 2024-08-19 23:50:15,855 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 11 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 23:50:17,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.270e+01 2.522e+01 2.849e+01 4.114e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 23:50:24,317 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13350, loss[loss=0.09546, beats_loss=0.01009, ecapa_loss=0.0001558, whisper_loss=0.08381, over 13811.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.08886, over 3771220.71 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:50:30,383 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 24 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-19 23:50:32,166 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 23:50:50,108 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 23:50:55,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=12.0 2024-08-19 23:51:02,570 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 23:51:14,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-19 23:51:22,829 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 23:51:32,246 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 23:51:41,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4579790.0, ans=0.015 2024-08-19 23:51:58,104 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13400, loss[loss=0.08881, beats_loss=0.01284, ecapa_loss=0.0001147, whisper_loss=0.07482, over 17470.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.08912, over 3749938.47 frames. ], batch size: 70, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:51:58,355 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 23:52:00,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-19 23:52:01,797 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 23:52:34,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-19 23:53:18,696 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 23:53:25,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.574e+01 2.343e+01 2.589e+01 2.932e+01 2.538e+02, threshold=5.179e+01, percent-clipped=4.0 2024-08-19 23:53:33,087 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13450, loss[loss=0.1162, beats_loss=0.01085, ecapa_loss=0.0001226, whisper_loss=0.1041, over 22621.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01038, ecapa_loss=0.0001407, whisper_loss=0.0888, over 3738616.34 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:53:35,779 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-19 23:53:42,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=12.0 2024-08-19 23:54:11,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4580590.0, ans=0.2 2024-08-19 23:54:14,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-08-19 23:54:26,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4580590.0, ans=0.125 2024-08-19 23:54:38,571 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 23:54:46,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4580690.0, ans=0.125 2024-08-19 23:55:10,924 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13500, loss[loss=0.08913, beats_loss=0.01115, ecapa_loss=0.0001689, whisper_loss=0.07629, over 17154.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.08939, over 3764962.07 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:55:54,457 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 23:55:54,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4581090.0, ans=0.125 2024-08-19 23:56:05,002 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 23:56:31,337 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 16 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 23:56:35,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.361e+01 2.616e+01 2.856e+01 5.147e+01, threshold=5.232e+01, percent-clipped=0.0 2024-08-19 23:56:38,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4581290.0, ans=0.015 2024-08-19 23:56:43,178 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13550, loss[loss=0.113, beats_loss=0.008002, ecapa_loss=0.0001479, whisper_loss=0.1035, over 23380.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001393, whisper_loss=0.08984, over 3783455.90 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:57:20,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4581590.0, ans=0.125 2024-08-19 23:57:37,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4581690.0, ans=0.125 2024-08-19 23:57:46,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4581690.0, ans=0.1 2024-08-19 23:57:57,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-19 23:58:03,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4581790.0, ans=0.125 2024-08-19 23:58:06,679 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 34 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 23:58:10,411 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-19 23:58:12,360 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 20 from LS+wenet, 8 from Vox, 22 fro AS 2024-08-19 23:58:17,086 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13600, loss[loss=0.09024, beats_loss=0.009558, ecapa_loss=0.0001672, whisper_loss=0.07901, over 19946.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.08971, over 3768416.74 frames. ], batch size: 82, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:58:39,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4581990.0, ans=0.0 2024-08-19 23:59:11,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4582190.0, ans=0.0 2024-08-19 23:59:12,208 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-08-19 23:59:28,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4582290.0, ans=0.2 2024-08-19 23:59:29,646 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 23:59:39,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.261e+01 2.439e+01 2.757e+01 6.326e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-19 23:59:43,917 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 23:59:47,344 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13650, loss[loss=0.1066, beats_loss=0.009661, ecapa_loss=0.000139, whisper_loss=0.09556, over 22008.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08951, over 3766703.12 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:59:47,533 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 23:59:49,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-20 00:00:01,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4582390.0, ans=0.125 2024-08-20 00:00:02,775 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 00:00:12,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4582490.0, ans=0.0 2024-08-20 00:00:38,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4582590.0, ans=0.125 2024-08-20 00:00:44,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4582690.0, ans=0.2 2024-08-20 00:00:51,655 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 00:00:56,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4582690.0, ans=0.125 2024-08-20 00:01:09,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4582790.0, ans=0.125 2024-08-20 00:01:19,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4582890.0, ans=0.0 2024-08-20 00:01:20,266 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13700, loss[loss=0.1004, beats_loss=0.009522, ecapa_loss=0.0001487, whisper_loss=0.08938, over 18418.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.08944, over 3764284.87 frames. ], batch size: 77, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:01:36,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4582990.0, ans=0.125 2024-08-20 00:01:53,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4582990.0, ans=0.2 2024-08-20 00:01:55,014 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 00:02:00,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4583090.0, ans=0.0 2024-08-20 00:02:23,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4583190.0, ans=0.0 2024-08-20 00:02:29,334 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 00:02:43,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4583290.0, ans=0.125 2024-08-20 00:02:43,682 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-20 00:02:46,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.351e+01 2.599e+01 2.817e+01 2.023e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-20 00:02:54,316 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13750, loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001238, whisper_loss=0.08927, over 21383.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001416, whisper_loss=0.09015, over 3796231.92 frames. ], batch size: 83, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:02:56,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4583390.0, ans=0.125 2024-08-20 00:03:28,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4583490.0, ans=0.1 2024-08-20 00:03:39,734 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 41 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 00:03:39,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4583590.0, ans=0.125 2024-08-20 00:03:56,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4583690.0, ans=0.125 2024-08-20 00:04:02,834 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 00:04:14,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4583790.0, ans=0.2 2024-08-20 00:04:20,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4583790.0, ans=0.125 2024-08-20 00:04:23,477 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 00:04:27,657 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13800, loss[loss=0.1127, beats_loss=0.009579, ecapa_loss=0.0001503, whisper_loss=0.1016, over 21446.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01029, ecapa_loss=0.0001418, whisper_loss=0.09113, over 3792432.98 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:04:39,569 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 00:04:40,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2024-08-20 00:04:59,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4583990.0, ans=0.125 2024-08-20 00:05:03,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4584090.0, ans=0.125 2024-08-20 00:05:15,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4584090.0, ans=0.1 2024-08-20 00:05:25,182 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.014e+00 2024-08-20 00:05:30,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4584190.0, ans=0.0 2024-08-20 00:05:42,697 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 00:05:51,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.269e+01 2.538e+01 2.800e+01 5.388e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-20 00:05:57,936 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13850, loss[loss=0.08419, beats_loss=0.01073, ecapa_loss=0.0001354, whisper_loss=0.0721, over 20154.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01027, ecapa_loss=0.0001414, whisper_loss=0.0912, over 3789885.58 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:06:02,185 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 00:06:26,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4584490.0, ans=0.125 2024-08-20 00:06:36,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4584590.0, ans=0.125 2024-08-20 00:07:21,757 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 00:07:30,772 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13900, loss[loss=0.08997, beats_loss=0.01008, ecapa_loss=0.0001191, whisper_loss=0.0787, over 15880.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01032, ecapa_loss=0.0001406, whisper_loss=0.09191, over 3812994.00 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:07:30,976 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 00:07:33,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4584890.0, ans=0.0 2024-08-20 00:07:51,218 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 00:08:15,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4585090.0, ans=0.0 2024-08-20 00:08:17,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4585090.0, ans=0.2 2024-08-20 00:08:26,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-20 00:08:28,354 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:08:45,885 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2024-08-20 00:08:56,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.301e+01 2.533e+01 2.957e+01 6.862e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-20 00:09:01,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4585290.0, ans=0.0 2024-08-20 00:09:03,976 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 13950, loss[loss=0.131, beats_loss=0.007305, ecapa_loss=0.0001403, whisper_loss=0.1223, over 19277.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.09137, over 3775707.89 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:09:25,003 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 00:09:34,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4585490.0, ans=0.0 2024-08-20 00:09:35,952 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 00:09:45,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4585590.0, ans=0.95 2024-08-20 00:09:58,284 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 00:09:58,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4585590.0, ans=0.125 2024-08-20 00:10:20,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4585790.0, ans=0.0 2024-08-20 00:10:30,748 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 00:10:40,045 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14000, loss[loss=0.09786, beats_loss=0.0101, ecapa_loss=0.0001267, whisper_loss=0.08649, over 13459.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.09175, over 3775320.98 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:11:23,564 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 00:11:33,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4586190.0, ans=0.1 2024-08-20 00:11:38,691 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 00:11:47,619 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 30 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 00:11:51,875 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-08-20 00:12:02,003 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 18 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 00:12:06,535 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 00:12:07,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.217e+01 2.438e+01 2.736e+01 1.084e+02, threshold=4.877e+01, percent-clipped=1.0 2024-08-20 00:12:15,036 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14050, loss[loss=0.09098, beats_loss=0.009494, ecapa_loss=0.0001302, whisper_loss=0.08019, over 14533.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09127, over 3799327.67 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:13:01,245 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 00:13:07,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=15.0 2024-08-20 00:13:14,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4586690.0, ans=0.125 2024-08-20 00:13:31,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4586790.0, ans=0.05 2024-08-20 00:13:49,160 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14100, loss[loss=0.1191, beats_loss=0.009062, ecapa_loss=0.0001485, whisper_loss=0.1086, over 23051.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.0915, over 3805263.30 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:13:51,259 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 00:13:56,467 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 00:14:11,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4586990.0, ans=0.0 2024-08-20 00:14:19,231 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 00:14:25,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4586990.0, ans=0.125 2024-08-20 00:14:28,664 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 23 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-20 00:14:38,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4587090.0, ans=0.125 2024-08-20 00:14:44,573 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.53 vs. limit=15.0 2024-08-20 00:15:07,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4587290.0, ans=0.125 2024-08-20 00:15:07,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-20 00:15:18,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.279e+01 2.555e+01 2.827e+01 5.250e+01, threshold=5.111e+01, percent-clipped=1.0 2024-08-20 00:15:22,525 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 00:15:24,375 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14150, loss[loss=0.114, beats_loss=0.009335, ecapa_loss=0.0001329, whisper_loss=0.1034, over 21376.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01036, ecapa_loss=0.0001397, whisper_loss=0.09109, over 3810761.82 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:16:19,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4587590.0, ans=0.125 2024-08-20 00:16:27,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4587690.0, ans=0.125 2024-08-20 00:16:42,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4587790.0, ans=0.04949747468305833 2024-08-20 00:16:59,915 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14200, loss[loss=0.1061, beats_loss=0.01213, ecapa_loss=0.0001155, whisper_loss=0.09278, over 23656.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01036, ecapa_loss=0.0001393, whisper_loss=0.091, over 3794252.62 frames. ], batch size: 94, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:17:00,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-20 00:17:17,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2024-08-20 00:17:21,653 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 14 from LS+wenet, 10 from Vox, 40 fro AS 2024-08-20 00:17:38,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4588090.0, ans=0.125 2024-08-20 00:17:48,023 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 00:17:57,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4588190.0, ans=0.05 2024-08-20 00:18:06,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4588190.0, ans=0.125 2024-08-20 00:18:10,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4588190.0, ans=0.125 2024-08-20 00:18:27,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.209e+01 2.487e+01 2.835e+01 4.985e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-20 00:18:33,071 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14250, loss[loss=0.09893, beats_loss=0.01077, ecapa_loss=0.000163, whisper_loss=0.08653, over 19310.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.09032, over 3806832.46 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:18:33,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4588390.0, ans=0.0 2024-08-20 00:18:37,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4588390.0, ans=0.1 2024-08-20 00:18:49,289 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 15 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 00:18:50,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4588490.0, ans=0.125 2024-08-20 00:19:16,513 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-20 00:19:43,413 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 00:20:06,754 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14300, loss[loss=0.08292, beats_loss=0.01123, ecapa_loss=0.0001077, whisper_loss=0.07062, over 16113.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.0901, over 3819752.97 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:20:11,144 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-20 00:20:12,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4588890.0, ans=0.0 2024-08-20 00:20:32,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4588990.0, ans=0.125 2024-08-20 00:20:46,183 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.76 vs. limit=22.5 2024-08-20 00:21:34,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4589290.0, ans=0.95 2024-08-20 00:21:36,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.288e+01 2.504e+01 2.843e+01 5.964e+01, threshold=5.008e+01, percent-clipped=1.0 2024-08-20 00:21:42,162 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14350, loss[loss=0.1035, beats_loss=0.01192, ecapa_loss=0.0001354, whisper_loss=0.09025, over 22659.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001388, whisper_loss=0.0906, over 3818367.09 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:21:44,385 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 00:21:50,561 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:21:50,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4589390.0, ans=0.125 2024-08-20 00:21:55,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4589390.0, ans=0.0 2024-08-20 00:22:31,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4589590.0, ans=0.125 2024-08-20 00:22:35,427 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 00:22:42,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4589690.0, ans=0.05 2024-08-20 00:22:48,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4589690.0, ans=0.04949747468305833 2024-08-20 00:23:01,578 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-20 00:23:05,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4589790.0, ans=0.125 2024-08-20 00:23:09,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4589790.0, ans=0.125 2024-08-20 00:23:14,348 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.978e+01 2024-08-20 00:23:16,989 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14400, loss[loss=0.1023, beats_loss=0.009869, ecapa_loss=0.0001264, whisper_loss=0.09113, over 23289.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.000138, whisper_loss=0.09063, over 3832357.52 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:23:20,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4589890.0, ans=0.125 2024-08-20 00:23:47,190 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 00:23:53,776 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 00:24:01,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4590090.0, ans=0.0 2024-08-20 00:24:05,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4590090.0, ans=0.125 2024-08-20 00:24:21,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=12.0 2024-08-20 00:24:41,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.268e+01 2.508e+01 2.742e+01 3.367e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 00:24:48,101 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14450, loss[loss=0.09899, beats_loss=0.01129, ecapa_loss=0.0001591, whisper_loss=0.08611, over 14481.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.09086, over 3840581.19 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:24:51,012 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.548e+00 2024-08-20 00:24:51,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4590390.0, ans=0.0 2024-08-20 00:24:54,448 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 00:25:07,756 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4590490.0, ans=0.125 2024-08-20 00:25:16,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4590490.0, ans=0.5 2024-08-20 00:26:03,173 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4590690.0, ans=0.0 2024-08-20 00:26:08,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4590790.0, ans=0.125 2024-08-20 00:26:24,557 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14500, loss[loss=0.1016, beats_loss=0.008721, ecapa_loss=0.0001674, whisper_loss=0.09117, over 19326.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.000138, whisper_loss=0.09102, over 3865153.07 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:26:34,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4590890.0, ans=0.125 2024-08-20 00:26:49,233 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 20 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-20 00:27:01,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-20 00:27:02,390 WARNING [optim.py:496] (1/4) Scaling gradients by 0.034884583204984665, model_norm_threshold=50.15473556518555 2024-08-20 00:27:02,548 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.848e+05, grad_sumsq=8.328e+07, orig_rms_sq=1.062e-02 2024-08-20 00:27:02,765 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 00:27:16,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-20 00:27:28,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4591190.0, ans=0.0 2024-08-20 00:27:29,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4591190.0, ans=0.125 2024-08-20 00:27:35,130 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 00:27:39,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.06 vs. limit=22.5 2024-08-20 00:27:40,668 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 12 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 00:27:41,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4591290.0, ans=0.2 2024-08-20 00:27:46,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-20 00:27:52,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.301e+01 2.496e+01 2.802e+01 1.438e+03, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 00:27:59,127 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14550, loss[loss=0.1161, beats_loss=0.009646, ecapa_loss=0.0001217, whisper_loss=0.1052, over 16250.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001409, whisper_loss=0.09057, over 3840859.66 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:28:07,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4591390.0, ans=0.2 2024-08-20 00:28:12,266 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 00:28:33,086 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 00:28:33,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4591490.0, ans=0.125 2024-08-20 00:28:59,218 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 00:29:05,141 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2024-08-20 00:29:14,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4591790.0, ans=0.125 2024-08-20 00:29:28,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-20 00:29:33,052 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14600, loss[loss=0.09916, beats_loss=0.01234, ecapa_loss=0.0001208, whisper_loss=0.08561, over 14918.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.09035, over 3865367.34 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:29:33,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4591890.0, ans=0.1 2024-08-20 00:29:44,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4591890.0, ans=0.2 2024-08-20 00:29:57,958 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4591990.0, ans=0.125 2024-08-20 00:30:03,200 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 00:30:11,124 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 00:30:27,316 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=12.0 2024-08-20 00:30:35,405 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 00:30:55,015 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 24 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-20 00:31:02,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.406e+01 2.621e+01 2.917e+01 4.385e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 00:31:03,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4592290.0, ans=0.1 2024-08-20 00:31:07,375 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14650, loss[loss=0.09176, beats_loss=0.01094, ecapa_loss=0.0001491, whisper_loss=0.07932, over 21692.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09053, over 3908872.08 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:31:35,944 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2024-08-20 00:32:10,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4592690.0, ans=0.2 2024-08-20 00:32:32,690 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 16 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 00:32:41,465 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14700, loss[loss=0.09169, beats_loss=0.01112, ecapa_loss=0.0001199, whisper_loss=0.07936, over 20477.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001402, whisper_loss=0.08967, over 3890760.57 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:33:10,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4592990.0, ans=0.125 2024-08-20 00:33:28,185 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 00:33:44,785 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:33:58,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4593290.0, ans=0.125 2024-08-20 00:34:07,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4593290.0, ans=0.0 2024-08-20 00:34:11,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4593290.0, ans=0.1 2024-08-20 00:34:12,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.392e+01 2.545e+01 2.884e+01 3.743e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-20 00:34:17,491 INFO [train_multi_KD3.py:1117] (1/4) Epoch 31, batch 14750, loss[loss=0.09115, beats_loss=0.01276, ecapa_loss=0.0001302, whisper_loss=0.07708, over 22468.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.08961, over 3863354.10 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:34:51,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4593490.0, ans=0.0 2024-08-20 00:35:23,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4593690.0, ans=0.125 2024-08-20 00:36:08,653 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 0, loss[loss=0.1004, beats_loss=0.01104, ecapa_loss=0.0001569, whisper_loss=0.08778, over 18457.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01104, ecapa_loss=0.0001569, whisper_loss=0.08778, over 18457.00 frames. ], batch size: 74, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:36:08,654 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 00:36:43,411 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005131, whisper_loss=0.2488, over 931116.00 frames. 2024-08-20 00:37:05,723 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 00:38:39,894 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 00:38:39,900 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 00:39:04,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4593900.0, ans=0.125 2024-08-20 00:39:04,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4593900.0, ans=0.04949747468305833 2024-08-20 00:39:06,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4593900.0, ans=0.125 2024-08-20 00:39:16,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4593900.0, ans=0.0 2024-08-20 00:39:18,408 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 00:39:19,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4593900.0, ans=0.2 2024-08-20 00:39:31,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2024-08-20 00:39:42,458 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 00:39:45,054 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 00:39:51,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4594000.0, ans=0.125 2024-08-20 00:39:55,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4594100.0, ans=0.1 2024-08-20 00:39:58,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4594100.0, ans=0.125 2024-08-20 00:40:40,564 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 50, loss[loss=0.1045, beats_loss=0.007763, ecapa_loss=0.000173, whisper_loss=0.09504, over 18466.00 frames. ], tot_loss[loss=0.09593, beats_loss=0.009754, ecapa_loss=0.0001457, whisper_loss=0.08472, over 854874.26 frames. ], batch size: 73, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:40:53,118 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 00:40:55,063 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.478e+01 2.729e+01 3.043e+01 3.966e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-20 00:41:00,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4594300.0, ans=0.0 2024-08-20 00:41:10,549 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 00:41:11,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4594400.0, ans=0.025 2024-08-20 00:41:13,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4594400.0, ans=0.2 2024-08-20 00:41:38,956 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 33 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 00:41:49,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.89 vs. limit=22.5 2024-08-20 00:42:00,171 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 00:42:11,040 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 23 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-20 00:42:27,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.65 vs. limit=10.0 2024-08-20 00:42:38,540 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 100, loss[loss=0.1069, beats_loss=0.008182, ecapa_loss=0.0001447, whisper_loss=0.09729, over 16429.00 frames. ], tot_loss[loss=0.09936, beats_loss=0.009295, ecapa_loss=0.0001467, whisper_loss=0.0886, over 1482562.37 frames. ], batch size: 65, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:42:38,761 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 00:42:56,765 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 33 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 00:42:57,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.03 vs. limit=6.0 2024-08-20 00:42:59,309 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 20 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-20 00:42:59,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4594800.0, ans=0.125 2024-08-20 00:43:21,423 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 00:43:25,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4595000.0, ans=0.125 2024-08-20 00:43:44,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-20 00:43:48,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4595100.0, ans=0.125 2024-08-20 00:44:37,713 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 150, loss[loss=0.1083, beats_loss=0.007887, ecapa_loss=0.0001907, whisper_loss=0.09853, over 16134.00 frames. ], tot_loss[loss=0.09947, beats_loss=0.009244, ecapa_loss=0.0001459, whisper_loss=0.08876, over 1972574.66 frames. ], batch size: 68, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:44:46,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4595300.0, ans=0.125 2024-08-20 00:44:50,015 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.529e+01 2.741e+01 3.091e+01 3.915e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-20 00:45:05,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4595400.0, ans=0.0 2024-08-20 00:45:25,494 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 00:46:12,733 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 200, loss[loss=0.08048, beats_loss=0.0129, ecapa_loss=0.0001489, whisper_loss=0.06609, over 17759.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.009432, ecapa_loss=0.0001447, whisper_loss=0.08967, over 2345711.70 frames. ], batch size: 73, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:46:17,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4595800.0, ans=0.07 2024-08-20 00:46:45,628 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03673094883561134, model_norm_threshold=54.82755661010742 2024-08-20 00:46:45,788 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.205e+05, grad_sumsq=3.205e+05, orig_rms_sq=1.000e+00 2024-08-20 00:46:57,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4596000.0, ans=0.02 2024-08-20 00:47:11,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4596100.0, ans=0.0 2024-08-20 00:47:15,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4596100.0, ans=0.2 2024-08-20 00:47:25,643 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 00:47:32,554 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 00:47:43,267 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 250, loss[loss=0.0694, beats_loss=0.01223, ecapa_loss=0.0001036, whisper_loss=0.05614, over 16839.00 frames. ], tot_loss[loss=0.101, beats_loss=0.009659, ecapa_loss=0.0001434, whisper_loss=0.08995, over 2661684.85 frames. ], batch size: 66, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:47:47,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4596300.0, ans=0.0 2024-08-20 00:47:53,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.311e+01 2.593e+01 2.981e+01 1.493e+03, threshold=5.185e+01, percent-clipped=1.0 2024-08-20 00:48:06,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4596400.0, ans=0.0 2024-08-20 00:48:21,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4596500.0, ans=0.0 2024-08-20 00:48:30,518 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 00:48:30,825 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.229e+05 2024-08-20 00:48:51,467 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 00:48:52,668 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 00:49:06,567 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 00:49:08,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4596800.0, ans=0.2 2024-08-20 00:49:09,726 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 300, loss[loss=0.1069, beats_loss=0.01093, ecapa_loss=0.0001257, whisper_loss=0.09473, over 16775.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009898, ecapa_loss=0.0001433, whisper_loss=0.08975, over 2885214.88 frames. ], batch size: 62, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:49:19,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-20 00:49:25,994 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03387049213051796, model_norm_threshold=51.854286193847656 2024-08-20 00:49:26,150 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.005e+05, grad_sumsq=9.116e+04, orig_rms_sq=3.297e+00 2024-08-20 00:49:34,306 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.83 vs. limit=22.5 2024-08-20 00:49:38,465 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 00:49:45,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4597000.0, ans=0.125 2024-08-20 00:49:51,935 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 16 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 00:50:02,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-20 00:50:07,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4597100.0, ans=0.1 2024-08-20 00:50:30,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4597200.0, ans=0.0 2024-08-20 00:50:37,086 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 350, loss[loss=0.09983, beats_loss=0.01056, ecapa_loss=0.000137, whisper_loss=0.08789, over 13491.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01014, ecapa_loss=0.0001415, whisper_loss=0.08914, over 3068802.53 frames. ], batch size: 51, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:50:48,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.225e+01 2.468e+01 2.778e+01 1.531e+03, threshold=4.937e+01, percent-clipped=2.0 2024-08-20 00:50:59,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-20 00:51:06,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4597400.0, ans=0.125 2024-08-20 00:51:08,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4597400.0, ans=0.5 2024-08-20 00:51:23,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4597500.0, ans=0.0 2024-08-20 00:51:37,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=4597600.0, ans=0.02 2024-08-20 00:51:41,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4597600.0, ans=0.95 2024-08-20 00:51:46,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4597700.0, ans=0.125 2024-08-20 00:52:04,906 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 400, loss[loss=0.1143, beats_loss=0.01025, ecapa_loss=0.0001382, whisper_loss=0.1026, over 19360.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01023, ecapa_loss=0.0001407, whisper_loss=0.08878, over 3251840.63 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:52:18,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4597800.0, ans=0.0 2024-08-20 00:52:28,703 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 33 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 00:52:36,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2024-08-20 00:52:39,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4598000.0, ans=0.0 2024-08-20 00:53:10,838 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 00:53:16,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4598200.0, ans=0.0 2024-08-20 00:53:35,337 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 450, loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001525, whisper_loss=0.09022, over 18002.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01025, ecapa_loss=0.0001403, whisper_loss=0.08882, over 3380048.74 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:53:37,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4598300.0, ans=0.125 2024-08-20 00:53:45,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.300e+01 2.526e+01 2.780e+01 3.592e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-20 00:53:46,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=12.0 2024-08-20 00:53:58,173 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 00:54:03,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4598400.0, ans=0.0 2024-08-20 00:54:06,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4598400.0, ans=0.95 2024-08-20 00:54:16,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4598500.0, ans=0.125 2024-08-20 00:54:35,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4598600.0, ans=0.125 2024-08-20 00:54:49,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2024-08-20 00:54:58,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4598700.0, ans=0.0 2024-08-20 00:55:01,777 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 500, loss[loss=0.08689, beats_loss=0.01149, ecapa_loss=0.0001465, whisper_loss=0.07393, over 14181.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01028, ecapa_loss=0.00014, whisper_loss=0.08859, over 3480175.35 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:55:47,001 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-08-20 00:55:47,888 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 28 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 00:55:48,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4599000.0, ans=0.0 2024-08-20 00:55:55,692 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 00:55:57,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4599100.0, ans=0.2 2024-08-20 00:56:06,216 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 00:56:20,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4599200.0, ans=10.0 2024-08-20 00:56:23,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-08-20 00:56:31,181 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 550, loss[loss=0.1167, beats_loss=0.008323, ecapa_loss=0.0001369, whisper_loss=0.107, over 22628.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01018, ecapa_loss=0.0001403, whisper_loss=0.08977, over 3560744.08 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:56:41,835 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.287e+01 2.466e+01 2.719e+01 4.330e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-20 00:56:48,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-20 00:57:20,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4599500.0, ans=0.0 2024-08-20 00:57:29,878 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 18 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-20 00:57:34,018 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 00:57:38,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4599600.0, ans=0.125 2024-08-20 00:57:42,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4599600.0, ans=0.1 2024-08-20 00:57:46,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4599700.0, ans=0.0 2024-08-20 00:57:51,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4599700.0, ans=0.05 2024-08-20 00:57:53,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4599700.0, ans=0.0 2024-08-20 00:58:03,839 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 600, loss[loss=0.1175, beats_loss=0.008407, ecapa_loss=0.0001398, whisper_loss=0.1077, over 14700.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01022, ecapa_loss=0.0001401, whisper_loss=0.08966, over 3606605.62 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:58:11,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4599800.0, ans=0.0 2024-08-20 00:58:49,463 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4600000.0, ans=0.125 2024-08-20 00:58:56,230 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 00:59:10,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4600100.0, ans=0.125 2024-08-20 00:59:21,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4600200.0, ans=0.125 2024-08-20 00:59:35,289 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 650, loss[loss=0.1105, beats_loss=0.008859, ecapa_loss=0.0001545, whisper_loss=0.1001, over 21681.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0102, ecapa_loss=0.0001401, whisper_loss=0.09, over 3629426.73 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:59:37,587 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 00:59:46,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.232e+01 2.532e+01 2.844e+01 3.570e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-20 00:59:50,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4600300.0, ans=0.125 2024-08-20 01:00:02,084 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 01:00:20,697 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07462587207555771, model_norm_threshold=50.64724349975586 2024-08-20 01:00:20,855 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.123e+04, grad_sumsq=9.123e+04, orig_rms_sq=1.000e+00 2024-08-20 01:00:21,101 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 01:00:23,318 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 24 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-20 01:00:58,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4600700.0, ans=0.0 2024-08-20 01:01:05,202 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 700, loss[loss=0.1264, beats_loss=0.006497, ecapa_loss=0.0001632, whisper_loss=0.1183, over 15187.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01029, ecapa_loss=0.0001403, whisper_loss=0.08934, over 3688280.23 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:01:07,587 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 01:01:10,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4600800.0, ans=0.05 2024-08-20 01:01:25,421 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 01:01:35,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4600900.0, ans=0.125 2024-08-20 01:01:38,140 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 01:01:40,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2024-08-20 01:01:47,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4601000.0, ans=0.2 2024-08-20 01:01:50,782 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 01:02:01,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4601100.0, ans=0.125 2024-08-20 01:02:10,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4601100.0, ans=0.1 2024-08-20 01:02:19,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4601200.0, ans=0.1 2024-08-20 01:02:24,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4601200.0, ans=0.125 2024-08-20 01:02:30,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2024-08-20 01:02:33,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4601300.0, ans=0.0 2024-08-20 01:02:34,858 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 750, loss[loss=0.104, beats_loss=0.01127, ecapa_loss=0.0001179, whisper_loss=0.0916, over 24012.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01028, ecapa_loss=0.0001403, whisper_loss=0.08911, over 3727847.53 frames. ], batch size: 96, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:02:38,739 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 01:02:45,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.348e+01 2.626e+01 2.965e+01 6.787e+02, threshold=5.252e+01, percent-clipped=3.0 2024-08-20 01:03:02,781 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 01:03:18,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4601500.0, ans=0.125 2024-08-20 01:03:37,470 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0 2024-08-20 01:03:38,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=12.0 2024-08-20 01:03:48,924 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-20 01:03:54,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4601700.0, ans=0.125 2024-08-20 01:04:00,100 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 800, loss[loss=0.1096, beats_loss=0.01134, ecapa_loss=0.0001035, whisper_loss=0.09721, over 19025.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01037, ecapa_loss=0.0001387, whisper_loss=0.08862, over 3721803.86 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:04:09,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4601800.0, ans=0.2 2024-08-20 01:04:22,917 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 01:04:37,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-20 01:04:40,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4602000.0, ans=0.125 2024-08-20 01:04:41,638 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 01:04:45,058 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 01:04:48,745 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-20 01:04:49,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4602000.0, ans=0.2 2024-08-20 01:04:52,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4602100.0, ans=0.0 2024-08-20 01:05:08,174 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 01:05:23,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4602200.0, ans=0.0 2024-08-20 01:05:26,418 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 850, loss[loss=0.0962, beats_loss=0.009895, ecapa_loss=0.000134, whisper_loss=0.08496, over 18876.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0103, ecapa_loss=0.0001399, whisper_loss=0.08864, over 3748996.09 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:05:34,021 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 01:05:37,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.270e+01 2.498e+01 2.868e+01 4.208e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-20 01:05:40,551 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 12 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 01:05:45,616 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 14 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-20 01:05:47,469 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 17 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 01:05:54,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4602400.0, ans=0.1 2024-08-20 01:06:04,199 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2024-08-20 01:06:17,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-08-20 01:06:18,770 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 01:06:25,304 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 01:06:29,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4602600.0, ans=0.0 2024-08-20 01:06:51,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4602800.0, ans=0.125 2024-08-20 01:06:53,215 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 900, loss[loss=0.09953, beats_loss=0.01007, ecapa_loss=0.000141, whisper_loss=0.08805, over 21083.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01022, ecapa_loss=0.0001386, whisper_loss=0.08895, over 3732122.13 frames. ], batch size: 86, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:07:04,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4602800.0, ans=0.125 2024-08-20 01:08:06,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4603200.0, ans=0.125 2024-08-20 01:08:06,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4603200.0, ans=0.125 2024-08-20 01:08:10,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4603200.0, ans=0.1 2024-08-20 01:08:16,816 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 01:08:19,546 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 950, loss[loss=0.1087, beats_loss=0.008577, ecapa_loss=0.000146, whisper_loss=0.09865, over 14204.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01024, ecapa_loss=0.0001382, whisper_loss=0.08882, over 3737365.77 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:08:31,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.373e+01 2.705e+01 3.029e+01 3.919e+02, threshold=5.410e+01, percent-clipped=3.0 2024-08-20 01:08:52,818 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 01:09:09,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4603600.0, ans=0.2 2024-08-20 01:09:15,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4603600.0, ans=0.125 2024-08-20 01:09:20,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2024-08-20 01:09:26,714 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2024-08-20 01:09:43,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4603700.0, ans=0.0 2024-08-20 01:09:43,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4603700.0, ans=0.2 2024-08-20 01:09:46,127 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1000, loss[loss=0.124, beats_loss=0.01037, ecapa_loss=0.0001488, whisper_loss=0.1122, over 22909.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01024, ecapa_loss=0.0001389, whisper_loss=0.08847, over 3748715.75 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:10:07,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4603900.0, ans=0.125 2024-08-20 01:10:12,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4603900.0, ans=0.0 2024-08-20 01:10:38,690 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 01:10:47,335 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 01:11:09,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4604200.0, ans=0.125 2024-08-20 01:11:09,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4604200.0, ans=0.125 2024-08-20 01:11:18,639 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1050, loss[loss=0.111, beats_loss=0.01025, ecapa_loss=0.0001279, whisper_loss=0.09951, over 24038.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01026, ecapa_loss=0.0001391, whisper_loss=0.08839, over 3757145.07 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:11:32,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.222e+01 2.426e+01 2.735e+01 4.130e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 01:11:49,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4604400.0, ans=0.09899494936611666 2024-08-20 01:11:49,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4604400.0, ans=0.0 2024-08-20 01:12:08,435 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 01:12:30,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4604700.0, ans=0.125 2024-08-20 01:12:49,406 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1100, loss[loss=0.09064, beats_loss=0.01182, ecapa_loss=0.0001366, whisper_loss=0.07746, over 21640.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01025, ecapa_loss=0.0001385, whisper_loss=0.08896, over 3736915.42 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:12:55,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4604800.0, ans=0.1 2024-08-20 01:12:55,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4604800.0, ans=0.125 2024-08-20 01:13:02,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4604800.0, ans=0.1 2024-08-20 01:13:25,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2024-08-20 01:13:31,300 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 01:13:35,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.72 vs. limit=15.0 2024-08-20 01:13:39,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=22.5 2024-08-20 01:14:09,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4605200.0, ans=0.04949747468305833 2024-08-20 01:14:10,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4605200.0, ans=0.1 2024-08-20 01:14:15,502 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1150, loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001367, whisper_loss=0.0908, over 16422.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01025, ecapa_loss=0.0001372, whisper_loss=0.08886, over 3717088.47 frames. ], batch size: 65, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:14:17,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-20 01:14:26,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-20 01:14:27,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.314e+01 2.565e+01 2.766e+01 1.499e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-20 01:14:27,670 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-20 01:14:52,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-08-20 01:15:07,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-08-20 01:15:15,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4605600.0, ans=0.025 2024-08-20 01:15:34,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4605700.0, ans=0.0 2024-08-20 01:15:36,287 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-20 01:15:40,909 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1200, loss[loss=0.1034, beats_loss=0.01091, ecapa_loss=0.0001205, whisper_loss=0.09124, over 19301.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01025, ecapa_loss=0.0001388, whisper_loss=0.08963, over 3733490.31 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:15:49,184 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 20 from LS+wenet, 36 from Vox, 36 fro AS 2024-08-20 01:15:58,075 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4605900.0, ans=0.125 2024-08-20 01:16:03,706 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 01:16:07,230 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 01:16:09,120 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 15 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 01:16:15,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=15.0 2024-08-20 01:17:11,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4606200.0, ans=0.1 2024-08-20 01:17:15,236 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1250, loss[loss=0.08233, beats_loss=0.01017, ecapa_loss=0.0001425, whisper_loss=0.07074, over 17950.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001388, whisper_loss=0.08941, over 3745772.61 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:17:31,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4606300.0, ans=0.1 2024-08-20 01:17:32,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.240e+01 2.537e+01 2.870e+01 6.660e+01, threshold=5.073e+01, percent-clipped=2.0 2024-08-20 01:17:56,111 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 13 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 01:18:05,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4606500.0, ans=0.05 2024-08-20 01:18:30,674 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 21 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 01:18:54,976 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 01:18:59,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4606700.0, ans=0.1 2024-08-20 01:19:04,915 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4606700.0, ans=0.125 2024-08-20 01:19:13,291 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1300, loss[loss=0.1022, beats_loss=0.01256, ecapa_loss=0.0001309, whisper_loss=0.08836, over 20674.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01037, ecapa_loss=0.0001391, whisper_loss=0.08897, over 3753041.63 frames. ], batch size: 86, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:19:17,994 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 01:19:25,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4606800.0, ans=0.0 2024-08-20 01:19:27,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4606800.0, ans=0.125 2024-08-20 01:19:39,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4606900.0, ans=0.125 2024-08-20 01:19:40,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4606900.0, ans=0.125 2024-08-20 01:19:49,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4606900.0, ans=0.125 2024-08-20 01:19:55,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4607000.0, ans=0.1 2024-08-20 01:20:00,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4607000.0, ans=0.125 2024-08-20 01:20:12,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4607000.0, ans=0.0 2024-08-20 01:20:18,863 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 01:20:36,725 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 01:20:50,423 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 16 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 01:21:03,541 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1350, loss[loss=0.1123, beats_loss=0.009247, ecapa_loss=0.0001115, whisper_loss=0.1019, over 20163.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01035, ecapa_loss=0.000139, whisper_loss=0.0888, over 3747508.12 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:21:09,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4607300.0, ans=0.125 2024-08-20 01:21:22,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.244e+01 2.406e+01 2.687e+01 4.080e+01, threshold=4.812e+01, percent-clipped=0.0 2024-08-20 01:21:23,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4607300.0, ans=15.0 2024-08-20 01:21:32,757 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.196e+00 2024-08-20 01:21:32,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4607400.0, ans=0.125 2024-08-20 01:21:35,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4607400.0, ans=0.125 2024-08-20 01:21:43,975 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 21 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-20 01:21:57,640 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 01:22:00,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4607500.0, ans=0.125 2024-08-20 01:22:06,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4607500.0, ans=0.125 2024-08-20 01:22:25,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4607600.0, ans=0.1 2024-08-20 01:22:43,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4607700.0, ans=0.125 2024-08-20 01:22:47,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4607700.0, ans=0.125 2024-08-20 01:22:49,839 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-20 01:23:07,433 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1400, loss[loss=0.1152, beats_loss=0.009749, ecapa_loss=0.0001407, whisper_loss=0.1041, over 17097.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.08894, over 3756585.94 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:23:35,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4607900.0, ans=0.0 2024-08-20 01:23:58,705 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 01:24:13,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4608000.0, ans=0.125 2024-08-20 01:24:20,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4608100.0, ans=0.125 2024-08-20 01:25:06,543 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0315290167927742, model_norm_threshold=48.11598205566406 2024-08-20 01:25:06,699 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.963e+05, grad_sumsq=4.963e+05, orig_rms_sq=1.000e+00 2024-08-20 01:25:09,013 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1450, loss[loss=0.1, beats_loss=0.01066, ecapa_loss=0.0001199, whisper_loss=0.08816, over 18105.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0103, ecapa_loss=0.0001386, whisper_loss=0.08879, over 3730066.38 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:25:12,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2024-08-20 01:25:21,967 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 01:25:26,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.461e+01 2.741e+01 1.526e+03, threshold=4.922e+01, percent-clipped=2.0 2024-08-20 01:25:33,731 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 33 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 01:25:39,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4608400.0, ans=0.0 2024-08-20 01:25:57,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4608500.0, ans=0.2 2024-08-20 01:26:00,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4608500.0, ans=0.125 2024-08-20 01:26:14,875 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:26:19,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4608600.0, ans=0.2 2024-08-20 01:26:58,281 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 01:27:02,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4608600.0, ans=0.2 2024-08-20 01:27:09,075 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 22 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-20 01:27:17,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4608700.0, ans=0.0 2024-08-20 01:27:31,045 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1500, loss[loss=0.07921, beats_loss=0.01265, ecapa_loss=0.0001157, whisper_loss=0.06541, over 22169.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01027, ecapa_loss=0.0001386, whisper_loss=0.0884, over 3742697.31 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:27:45,065 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 01:28:04,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4608900.0, ans=0.125 2024-08-20 01:28:38,446 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2024-08-20 01:28:40,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-20 01:28:57,646 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-20 01:29:05,233 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 01:29:13,093 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1550, loss[loss=0.09359, beats_loss=0.011, ecapa_loss=0.0001487, whisper_loss=0.08111, over 18634.00 frames. ], tot_loss[loss=0.09921, beats_loss=0.0103, ecapa_loss=0.0001393, whisper_loss=0.08751, over 3739173.37 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:29:17,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4609300.0, ans=0.125 2024-08-20 01:29:20,976 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 01:29:27,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.175e+01 2.465e+01 2.675e+01 6.220e+01, threshold=4.930e+01, percent-clipped=1.0 2024-08-20 01:30:05,075 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 01:30:10,475 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 01:30:10,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4609600.0, ans=0.0 2024-08-20 01:30:21,430 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.90 vs. limit=22.5 2024-08-20 01:30:49,735 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1600, loss[loss=0.1331, beats_loss=0.007851, ecapa_loss=0.0001631, whisper_loss=0.1236, over 23178.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01021, ecapa_loss=0.0001399, whisper_loss=0.08847, over 3783348.70 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:30:59,181 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 01:31:17,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4609900.0, ans=0.2 2024-08-20 01:31:41,248 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 14 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-20 01:31:41,654 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-20 01:31:46,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4610100.0, ans=0.125 2024-08-20 01:31:52,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4610100.0, ans=0.125 2024-08-20 01:32:01,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4610100.0, ans=0.1 2024-08-20 01:32:03,695 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-20 01:32:05,344 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 01:32:13,424 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=12.0 2024-08-20 01:32:18,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4610200.0, ans=0.125 2024-08-20 01:32:24,532 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1650, loss[loss=0.1118, beats_loss=0.006308, ecapa_loss=0.0001469, whisper_loss=0.104, over 14805.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0102, ecapa_loss=0.000139, whisper_loss=0.08896, over 3833926.96 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:32:39,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.232e+01 2.495e+01 2.715e+01 1.384e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-20 01:32:50,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4610400.0, ans=0.1 2024-08-20 01:33:28,258 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-20 01:33:34,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.49 vs. limit=10.0 2024-08-20 01:33:38,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4610700.0, ans=0.035 2024-08-20 01:33:57,988 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1700, loss[loss=0.1058, beats_loss=0.008863, ecapa_loss=0.0001695, whisper_loss=0.09527, over 22719.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01014, ecapa_loss=0.0001395, whisper_loss=0.0895, over 3825444.99 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:34:00,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4610800.0, ans=0.2 2024-08-20 01:34:11,491 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 01:34:19,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4610900.0, ans=0.125 2024-08-20 01:34:19,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-20 01:34:31,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2024-08-20 01:35:26,064 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1750, loss[loss=0.09087, beats_loss=0.01106, ecapa_loss=0.0001385, whisper_loss=0.07843, over 19051.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01011, ecapa_loss=0.0001377, whisper_loss=0.08961, over 3805500.81 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:35:38,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.241e+01 2.449e+01 2.717e+01 4.269e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 01:35:59,900 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 01:36:03,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4611500.0, ans=0.125 2024-08-20 01:36:04,837 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 18 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 01:36:17,366 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 01:36:20,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2024-08-20 01:36:34,801 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.875e+05 2024-08-20 01:36:41,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4611700.0, ans=0.125 2024-08-20 01:36:45,872 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 01:36:52,764 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1800, loss[loss=0.1235, beats_loss=0.008818, ecapa_loss=0.0001315, whisper_loss=0.1133, over 24653.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01019, ecapa_loss=0.0001359, whisper_loss=0.08924, over 3791709.25 frames. ], batch size: 94, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:36:58,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4611800.0, ans=0.125 2024-08-20 01:37:05,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4611800.0, ans=0.125 2024-08-20 01:37:07,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4611800.0, ans=0.125 2024-08-20 01:37:08,456 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 01:37:17,042 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 28 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 01:37:17,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4611900.0, ans=0.125 2024-08-20 01:37:17,793 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2024-08-20 01:37:22,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4611900.0, ans=0.0 2024-08-20 01:37:27,727 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 33 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 01:37:28,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-20 01:37:49,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-20 01:37:54,964 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 01:38:18,888 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1850, loss[loss=0.09637, beats_loss=0.008286, ecapa_loss=0.0001573, whisper_loss=0.08651, over 16712.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0102, ecapa_loss=0.0001351, whisper_loss=0.0898, over 3795033.69 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:38:26,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4612300.0, ans=0.0 2024-08-20 01:38:31,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.236e+01 2.438e+01 2.690e+01 3.613e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-20 01:38:54,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=15.0 2024-08-20 01:38:55,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4612500.0, ans=0.125 2024-08-20 01:39:08,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4612500.0, ans=0.0 2024-08-20 01:39:19,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4612600.0, ans=0.09899494936611666 2024-08-20 01:39:20,847 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 01:39:21,107 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:39:33,601 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-20 01:39:47,176 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1900, loss[loss=0.109, beats_loss=0.01102, ecapa_loss=0.0001152, whisper_loss=0.09684, over 20716.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01022, ecapa_loss=0.0001364, whisper_loss=0.08951, over 3808777.11 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:39:49,109 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 01:39:58,664 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.145e+00 2024-08-20 01:40:00,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-20 01:40:19,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4612900.0, ans=0.1 2024-08-20 01:40:31,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4613000.0, ans=0.125 2024-08-20 01:40:40,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4613100.0, ans=0.0 2024-08-20 01:40:50,305 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 13 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 01:40:54,078 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 01:40:56,535 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=15.0 2024-08-20 01:41:14,212 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 1950, loss[loss=0.09895, beats_loss=0.01081, ecapa_loss=0.0001381, whisper_loss=0.08677, over 21751.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0103, ecapa_loss=0.0001361, whisper_loss=0.08886, over 3790953.08 frames. ], batch size: 85, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:41:26,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.348e+01 2.572e+01 2.844e+01 4.490e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-20 01:41:53,923 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 37 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 01:41:59,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4613500.0, ans=0.125 2024-08-20 01:42:22,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4613700.0, ans=0.2 2024-08-20 01:42:25,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4613700.0, ans=0.125 2024-08-20 01:42:25,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4613700.0, ans=0.125 2024-08-20 01:42:39,930 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2000, loss[loss=0.08144, beats_loss=0.01084, ecapa_loss=0.0001414, whisper_loss=0.06918, over 14048.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01034, ecapa_loss=0.0001361, whisper_loss=0.0887, over 3795836.38 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:42:51,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-20 01:43:04,428 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0779990628361702, model_norm_threshold=51.44282531738281 2024-08-20 01:43:04,586 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.291e+04, grad_sumsq=4.291e+04, orig_rms_sq=1.000e+00 2024-08-20 01:43:13,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4614000.0, ans=0.125 2024-08-20 01:43:17,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4614000.0, ans=0.125 2024-08-20 01:43:29,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4614000.0, ans=0.0 2024-08-20 01:43:32,966 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 01:43:41,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4614100.0, ans=0.125 2024-08-20 01:43:57,576 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4614200.0, ans=0.0 2024-08-20 01:44:07,480 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2050, loss[loss=0.08361, beats_loss=0.01443, ecapa_loss=9.737e-05, whisper_loss=0.0682, over 21482.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01032, ecapa_loss=0.0001356, whisper_loss=0.089, over 3802089.48 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:44:19,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.219e+01 2.452e+01 2.809e+01 6.595e+02, threshold=4.904e+01, percent-clipped=1.0 2024-08-20 01:44:39,269 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 01:44:47,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4614500.0, ans=0.125 2024-08-20 01:44:49,071 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 01:44:52,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-20 01:45:05,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4614600.0, ans=0.1 2024-08-20 01:45:11,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-20 01:45:33,355 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2100, loss[loss=0.1302, beats_loss=0.009132, ecapa_loss=0.0001514, whisper_loss=0.1195, over 22086.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01034, ecapa_loss=0.000136, whisper_loss=0.08851, over 3786748.30 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:45:36,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4614800.0, ans=0.1 2024-08-20 01:45:49,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4614900.0, ans=0.125 2024-08-20 01:45:56,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4614900.0, ans=0.125 2024-08-20 01:45:59,596 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=12.0 2024-08-20 01:46:08,831 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 01:46:35,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4615100.0, ans=0.1 2024-08-20 01:46:47,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4615200.0, ans=6.0 2024-08-20 01:46:53,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4615200.0, ans=0.0 2024-08-20 01:46:55,791 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2024-08-20 01:46:59,051 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 01:47:00,095 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2150, loss[loss=0.1029, beats_loss=0.008954, ecapa_loss=0.0001598, whisper_loss=0.09234, over 14039.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.0001364, whisper_loss=0.08884, over 3777027.10 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:47:07,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4615300.0, ans=0.125 2024-08-20 01:47:09,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4615300.0, ans=0.125 2024-08-20 01:47:12,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.213e+01 2.411e+01 2.746e+01 4.203e+01, threshold=4.821e+01, percent-clipped=0.0 2024-08-20 01:47:31,264 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 01:48:06,716 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 01:48:10,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4615700.0, ans=0.0 2024-08-20 01:48:16,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-20 01:48:19,355 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 01:48:21,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4615700.0, ans=0.125 2024-08-20 01:48:25,545 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2200, loss[loss=0.09827, beats_loss=0.009947, ecapa_loss=0.0001788, whisper_loss=0.08653, over 17849.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001372, whisper_loss=0.08943, over 3772553.60 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:48:27,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4615800.0, ans=0.125 2024-08-20 01:48:43,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4615900.0, ans=10.0 2024-08-20 01:48:44,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4615900.0, ans=0.125 2024-08-20 01:48:48,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-20 01:49:00,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.27 vs. limit=10.0 2024-08-20 01:49:20,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4616100.0, ans=0.125 2024-08-20 01:49:27,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4616100.0, ans=0.125 2024-08-20 01:49:27,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-20 01:49:38,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4616200.0, ans=0.125 2024-08-20 01:49:41,180 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 01:49:50,342 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2250, loss[loss=0.08182, beats_loss=0.00946, ecapa_loss=0.0001242, whisper_loss=0.07112, over 14885.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01036, ecapa_loss=0.0001363, whisper_loss=0.08933, over 3762162.64 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:49:54,444 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:50:02,022 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.187e+01 2.427e+01 2.680e+01 3.409e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 01:50:08,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4616400.0, ans=0.125 2024-08-20 01:50:10,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4616400.0, ans=10.0 2024-08-20 01:50:18,341 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 01:50:31,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-20 01:50:36,799 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 01:50:38,748 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 01:50:39,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4616500.0, ans=0.2 2024-08-20 01:50:51,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2024-08-20 01:51:08,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4616700.0, ans=0.0 2024-08-20 01:51:15,684 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2300, loss[loss=0.1218, beats_loss=0.008513, ecapa_loss=0.0001211, whisper_loss=0.1121, over 21399.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001361, whisper_loss=0.08988, over 3801001.49 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:51:34,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4616900.0, ans=0.5 2024-08-20 01:51:34,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2024-08-20 01:51:35,501 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 25 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 01:51:43,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4616900.0, ans=0.125 2024-08-20 01:52:05,245 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 19 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-20 01:52:35,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4617200.0, ans=0.0 2024-08-20 01:52:43,083 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2350, loss[loss=0.08906, beats_loss=0.01343, ecapa_loss=0.0001213, whisper_loss=0.07442, over 21779.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001375, whisper_loss=0.09048, over 3831593.40 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:52:48,846 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 37 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 01:52:55,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.315e+01 2.598e+01 2.990e+01 3.797e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-20 01:53:04,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4617400.0, ans=0.125 2024-08-20 01:53:12,206 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 01:53:18,868 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 01:53:34,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4617600.0, ans=0.0 2024-08-20 01:54:07,164 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2400, loss[loss=0.1024, beats_loss=0.01206, ecapa_loss=0.0001405, whisper_loss=0.08893, over 22663.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01028, ecapa_loss=0.000139, whisper_loss=0.09093, over 3790977.48 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:54:08,058 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 01:54:45,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4618000.0, ans=0.125 2024-08-20 01:55:33,039 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2450, loss[loss=0.08295, beats_loss=0.01221, ecapa_loss=0.0001393, whisper_loss=0.06934, over 21788.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01028, ecapa_loss=0.0001394, whisper_loss=0.0909, over 3805993.04 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:55:45,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.204e+01 2.412e+01 2.711e+01 4.337e+02, threshold=4.825e+01, percent-clipped=1.0 2024-08-20 01:55:50,860 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-20 01:55:52,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4618400.0, ans=0.2 2024-08-20 01:55:54,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4618400.0, ans=0.125 2024-08-20 01:56:18,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-20 01:56:20,268 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 01:56:23,023 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=22.5 2024-08-20 01:56:31,835 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2024-08-20 01:57:03,896 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2500, loss[loss=0.1237, beats_loss=0.009563, ecapa_loss=0.0001426, whisper_loss=0.1127, over 23362.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01028, ecapa_loss=0.0001395, whisper_loss=0.09086, over 3796859.86 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:57:09,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-08-20 01:57:23,557 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 13 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 01:57:26,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4618900.0, ans=0.0 2024-08-20 01:57:27,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4618900.0, ans=0.125 2024-08-20 01:57:34,298 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 01:57:43,473 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 01:57:47,558 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-20 01:57:57,044 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 01:57:57,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-20 01:58:30,363 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 01:58:32,181 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2550, loss[loss=0.1037, beats_loss=0.01109, ecapa_loss=0.0001302, whisper_loss=0.09133, over 21189.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001387, whisper_loss=0.09076, over 3805734.96 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:58:36,551 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 01:58:37,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-20 01:58:37,205 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-20 01:58:37,885 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 01:58:44,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.306e+01 2.523e+01 2.847e+01 3.512e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-20 01:59:23,610 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 01:59:37,347 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 02:00:01,015 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2600, loss[loss=0.129, beats_loss=0.008932, ecapa_loss=0.0001299, whisper_loss=0.1187, over 14216.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.09113, over 3832887.42 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:00:11,870 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-20 02:00:35,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4620000.0, ans=0.07 2024-08-20 02:00:41,266 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-20 02:00:43,239 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 02:00:58,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4620100.0, ans=0.0 2024-08-20 02:01:16,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4620200.0, ans=0.1 2024-08-20 02:01:18,021 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 28 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-20 02:01:30,038 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2650, loss[loss=0.1184, beats_loss=0.008077, ecapa_loss=0.0001421, whisper_loss=0.1089, over 20644.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.0001392, whisper_loss=0.09098, over 3830073.72 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:01:42,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.354e+01 2.571e+01 2.953e+01 6.961e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-20 02:01:43,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4620300.0, ans=0.1 2024-08-20 02:01:57,214 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 02:02:04,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4620500.0, ans=0.125 2024-08-20 02:02:17,556 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 02:02:33,624 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 02:02:38,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-20 02:02:48,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4620700.0, ans=0.1 2024-08-20 02:02:58,542 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2700, loss[loss=0.1183, beats_loss=0.01045, ecapa_loss=0.0001256, whisper_loss=0.1066, over 15981.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01027, ecapa_loss=0.000139, whisper_loss=0.09133, over 3845869.34 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:03:10,937 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 13 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 02:03:12,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4620800.0, ans=0.125 2024-08-20 02:03:20,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4620900.0, ans=0.0 2024-08-20 02:03:21,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4620900.0, ans=0.125 2024-08-20 02:03:52,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4621100.0, ans=0.0 2024-08-20 02:03:58,149 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-20 02:04:18,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4621200.0, ans=0.1 2024-08-20 02:04:24,792 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2750, loss[loss=0.09836, beats_loss=0.009622, ecapa_loss=0.0001543, whisper_loss=0.08719, over 14608.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01028, ecapa_loss=0.0001394, whisper_loss=0.09075, over 3818326.37 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:04:27,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4621300.0, ans=0.125 2024-08-20 02:04:36,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.512e+01 2.707e+01 3.446e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 02:04:46,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.12 vs. limit=22.5 2024-08-20 02:04:48,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4621400.0, ans=0.1 2024-08-20 02:04:54,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4621400.0, ans=0.0 2024-08-20 02:04:57,786 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 02:05:04,688 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 02:05:23,196 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 02:05:31,690 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 13 from Vox, 50 fro AS 2024-08-20 02:05:35,586 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-20 02:05:49,540 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 02:05:53,030 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2800, loss[loss=0.1125, beats_loss=0.009548, ecapa_loss=0.0001418, whisper_loss=0.1015, over 23061.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0102, ecapa_loss=0.0001396, whisper_loss=0.09141, over 3809942.32 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:06:34,329 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 02:06:34,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4622000.0, ans=0.0 2024-08-20 02:06:34,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4622000.0, ans=0.0 2024-08-20 02:06:49,901 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 02:07:10,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=12.0 2024-08-20 02:07:22,912 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2850, loss[loss=0.09001, beats_loss=0.01069, ecapa_loss=0.0001152, whisper_loss=0.07817, over 15367.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01024, ecapa_loss=0.0001383, whisper_loss=0.09111, over 3810789.77 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:07:34,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4622300.0, ans=0.125 2024-08-20 02:07:35,620 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.251e+01 2.480e+01 2.760e+01 4.318e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-20 02:08:10,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4622500.0, ans=0.0 2024-08-20 02:08:13,886 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.35 vs. limit=15.0 2024-08-20 02:08:31,707 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-20 02:08:47,452 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 02:08:52,965 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2900, loss[loss=0.1065, beats_loss=0.009113, ecapa_loss=0.0001356, whisper_loss=0.09601, over 15971.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01028, ecapa_loss=0.0001377, whisper_loss=0.09075, over 3787283.06 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:08:54,786 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 18 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-20 02:09:01,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4622800.0, ans=0.1 2024-08-20 02:09:24,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4622900.0, ans=0.125 2024-08-20 02:09:35,957 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 11 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 02:09:47,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4623100.0, ans=0.0 2024-08-20 02:09:55,855 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 12 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 02:10:01,252 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 02:10:11,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4623200.0, ans=0.1 2024-08-20 02:10:13,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4623200.0, ans=0.2 2024-08-20 02:10:20,482 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 02:10:22,386 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 2950, loss[loss=0.09571, beats_loss=0.01261, ecapa_loss=0.0001516, whisper_loss=0.08158, over 21930.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001397, whisper_loss=0.09028, over 3792109.97 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:10:24,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4623300.0, ans=0.125 2024-08-20 02:10:28,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4623300.0, ans=0.1 2024-08-20 02:10:34,571 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.285e+01 2.491e+01 2.729e+01 3.693e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 02:10:35,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=12.0 2024-08-20 02:10:38,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4623400.0, ans=0.1 2024-08-20 02:10:45,395 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 02:10:47,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4623400.0, ans=0.1 2024-08-20 02:10:54,136 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 02:11:08,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4623500.0, ans=0.1 2024-08-20 02:11:18,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4623600.0, ans=0.125 2024-08-20 02:11:41,155 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 02:11:48,925 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3000, loss[loss=0.09597, beats_loss=0.01132, ecapa_loss=0.0002062, whisper_loss=0.08259, over 14350.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01028, ecapa_loss=0.0001396, whisper_loss=0.09035, over 3804962.12 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:11:48,926 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 02:12:25,588 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.000511, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 02:12:46,456 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on SV_voxceleb1: loss=0.003941, beats_loss=0, ecapa_loss=0.0003941, whisper_loss=0, over 944235.00 frames. 2024-08-20 02:14:20,862 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on AT_audioset: loss=0.02293, beats_loss=0.02293, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 02:14:20,865 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 02:14:33,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-08-20 02:14:36,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-08-20 02:14:42,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4623900.0, ans=0.125 2024-08-20 02:15:09,397 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 02:15:30,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4624200.0, ans=0.125 2024-08-20 02:15:32,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4624200.0, ans=0.125 2024-08-20 02:15:35,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4624200.0, ans=0.0 2024-08-20 02:15:39,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4624200.0, ans=0.1 2024-08-20 02:15:41,950 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 02:15:43,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4624300.0, ans=0.125 2024-08-20 02:15:44,441 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3050, loss[loss=0.1212, beats_loss=0.01012, ecapa_loss=0.0001657, whisper_loss=0.1095, over 19119.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001396, whisper_loss=0.0901, over 3815148.26 frames. ], batch size: 82, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:15:49,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4624300.0, ans=0.0 2024-08-20 02:15:50,420 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 12 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 02:15:55,054 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 02:15:56,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.348e+01 2.639e+01 2.982e+01 8.249e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-20 02:15:58,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4624300.0, ans=0.025 2024-08-20 02:16:02,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4624400.0, ans=0.125 2024-08-20 02:16:30,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4624500.0, ans=0.0 2024-08-20 02:16:35,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4624600.0, ans=0.5 2024-08-20 02:16:47,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4624600.0, ans=0.125 2024-08-20 02:16:54,629 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 23 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 02:16:58,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4624700.0, ans=0.1 2024-08-20 02:17:07,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4624700.0, ans=0.0 2024-08-20 02:17:08,617 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 02:17:09,618 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3100, loss[loss=0.09199, beats_loss=0.0119, ecapa_loss=0.0001281, whisper_loss=0.0788, over 18079.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.0903, over 3834637.72 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:17:17,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4624800.0, ans=0.09899494936611666 2024-08-20 02:17:29,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4624900.0, ans=0.125 2024-08-20 02:17:29,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-08-20 02:17:44,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4625000.0, ans=0.1 2024-08-20 02:17:46,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4625000.0, ans=0.0 2024-08-20 02:18:33,641 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3150, loss[loss=0.1141, beats_loss=0.008294, ecapa_loss=0.0001977, whisper_loss=0.1039, over 20248.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01029, ecapa_loss=0.0001407, whisper_loss=0.0914, over 3845668.88 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:18:34,135 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 02:18:34,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=4625300.0, ans=0.02 2024-08-20 02:18:44,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.262e+01 2.448e+01 2.716e+01 4.425e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-20 02:18:49,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4625400.0, ans=10.0 2024-08-20 02:18:52,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2024-08-20 02:19:05,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4625500.0, ans=0.1 2024-08-20 02:19:11,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4625500.0, ans=0.1 2024-08-20 02:19:22,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4625600.0, ans=0.125 2024-08-20 02:19:22,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4625600.0, ans=0.125 2024-08-20 02:19:24,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4625600.0, ans=0.0 2024-08-20 02:19:56,718 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3200, loss[loss=0.1161, beats_loss=0.009236, ecapa_loss=0.0001158, whisper_loss=0.1058, over 15553.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.08992, over 3855838.61 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:20:55,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4626100.0, ans=0.125 2024-08-20 02:21:05,555 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 02:21:20,039 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3250, loss[loss=0.08999, beats_loss=0.01083, ecapa_loss=0.0001448, whisper_loss=0.07771, over 17522.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001416, whisper_loss=0.08992, over 3829189.61 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:21:32,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.285e+01 2.517e+01 2.834e+01 4.980e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-20 02:21:33,298 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 02:21:44,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4626400.0, ans=0.0 2024-08-20 02:21:56,114 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 02:22:04,766 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 02:22:23,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=8.0 2024-08-20 02:22:27,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4626600.0, ans=0.125 2024-08-20 02:22:38,812 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 02:22:39,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4626700.0, ans=0.125 2024-08-20 02:22:40,332 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 02:22:43,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4626700.0, ans=0.125 2024-08-20 02:22:46,850 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3300, loss[loss=0.1054, beats_loss=0.008208, ecapa_loss=0.0001454, whisper_loss=0.09574, over 16855.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001425, whisper_loss=0.09047, over 3804836.40 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:22:58,748 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-20 02:23:10,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4626900.0, ans=0.125 2024-08-20 02:23:24,997 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 02:23:27,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-20 02:23:35,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2024-08-20 02:23:44,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4627100.0, ans=0.125 2024-08-20 02:23:55,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4627200.0, ans=0.1 2024-08-20 02:24:08,863 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3350, loss[loss=0.09872, beats_loss=0.01015, ecapa_loss=0.0001556, whisper_loss=0.08701, over 20792.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.0903, over 3770509.97 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:24:19,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4627300.0, ans=0.0 2024-08-20 02:24:20,716 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.203e+01 2.406e+01 2.784e+01 4.307e+01, threshold=4.813e+01, percent-clipped=0.0 2024-08-20 02:24:31,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4627400.0, ans=0.1 2024-08-20 02:24:32,713 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 02:24:38,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4627400.0, ans=0.125 2024-08-20 02:24:39,584 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 02:25:21,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4627700.0, ans=0.1 2024-08-20 02:25:25,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-08-20 02:25:32,819 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3400, loss[loss=0.09922, beats_loss=0.01199, ecapa_loss=0.0001571, whisper_loss=0.08566, over 21618.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001419, whisper_loss=0.09029, over 3754054.57 frames. ], batch size: 94, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:25:39,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2024-08-20 02:25:53,918 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2024-08-20 02:26:03,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-08-20 02:26:22,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4628100.0, ans=10.0 2024-08-20 02:26:33,613 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 02:26:45,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-08-20 02:26:55,321 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3450, loss[loss=0.09587, beats_loss=0.01141, ecapa_loss=0.0001146, whisper_loss=0.08331, over 16617.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.08987, over 3724130.34 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:27:01,972 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.20 vs. limit=15.0 2024-08-20 02:27:07,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.276e+01 2.600e+01 2.959e+01 4.699e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-20 02:27:22,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-08-20 02:27:24,785 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 02:27:33,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4628500.0, ans=0.125 2024-08-20 02:27:42,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4628500.0, ans=0.2 2024-08-20 02:27:53,750 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 02:27:58,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4628600.0, ans=0.125 2024-08-20 02:28:16,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4628700.0, ans=0.0 2024-08-20 02:28:19,446 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3500, loss[loss=0.1174, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.1055, over 23272.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08928, over 3740344.41 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:28:26,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4628800.0, ans=0.2 2024-08-20 02:28:46,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4628900.0, ans=0.0 2024-08-20 02:28:58,103 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 02:28:58,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-08-20 02:29:26,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4629200.0, ans=0.125 2024-08-20 02:29:29,928 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 02:29:34,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=15.0 2024-08-20 02:29:41,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4629200.0, ans=0.0 2024-08-20 02:29:44,372 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3550, loss[loss=0.08165, beats_loss=0.01231, ecapa_loss=0.0001248, whisper_loss=0.0681, over 17536.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.08856, over 3747001.59 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:29:44,635 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 02:29:56,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.386e+01 2.605e+01 2.983e+01 3.766e+02, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 02:29:59,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4629300.0, ans=0.0 2024-08-20 02:30:52,614 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-20 02:30:57,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4629600.0, ans=0.1 2024-08-20 02:31:22,268 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 17 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 02:31:24,047 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3600, loss[loss=0.07821, beats_loss=0.01286, ecapa_loss=0.0001287, whisper_loss=0.06406, over 19183.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001408, whisper_loss=0.08886, over 3760393.12 frames. ], batch size: 79, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:31:45,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2024-08-20 02:31:49,169 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 02:32:03,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=4629900.0, ans=10.0 2024-08-20 02:32:20,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4630000.0, ans=0.0 2024-08-20 02:32:24,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4630000.0, ans=0.125 2024-08-20 02:33:01,579 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 02:33:03,502 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 02:33:15,259 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3650, loss[loss=0.11, beats_loss=0.009855, ecapa_loss=0.0001404, whisper_loss=0.09877, over 14479.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08926, over 3779330.74 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:33:29,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.219e+01 2.439e+01 2.661e+01 4.108e+01, threshold=4.879e+01, percent-clipped=0.0 2024-08-20 02:33:34,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4630400.0, ans=0.125 2024-08-20 02:33:39,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4630400.0, ans=0.2 2024-08-20 02:33:50,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-08-20 02:33:51,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4630400.0, ans=0.05 2024-08-20 02:33:53,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4630400.0, ans=0.0 2024-08-20 02:33:56,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4630500.0, ans=0.125 2024-08-20 02:33:58,953 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=22.5 2024-08-20 02:34:12,095 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 02:34:24,259 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4630600.0, ans=0.125 2024-08-20 02:34:27,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4630600.0, ans=0.125 2024-08-20 02:34:40,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4630600.0, ans=0.2 2024-08-20 02:34:50,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4630700.0, ans=0.125 2024-08-20 02:34:56,817 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 02:34:59,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4630700.0, ans=0.125 2024-08-20 02:35:06,824 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3700, loss[loss=0.08905, beats_loss=0.01108, ecapa_loss=0.0001721, whisper_loss=0.07625, over 19865.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001408, whisper_loss=0.08938, over 3775126.81 frames. ], batch size: 86, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:35:22,489 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 02:35:32,331 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 02:35:58,539 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 02:36:03,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4631000.0, ans=0.0 2024-08-20 02:36:12,070 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 02:36:15,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4631100.0, ans=0.0 2024-08-20 02:36:17,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4631100.0, ans=15.0 2024-08-20 02:36:25,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-20 02:36:51,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4631200.0, ans=0.0 2024-08-20 02:36:55,669 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 02:36:57,023 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 02:36:58,530 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3750, loss[loss=0.1033, beats_loss=0.01091, ecapa_loss=0.0001263, whisper_loss=0.09113, over 14615.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.08958, over 3767203.16 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:37:07,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4631300.0, ans=0.2 2024-08-20 02:37:13,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.276e+01 2.480e+01 2.901e+01 4.929e+01, threshold=4.959e+01, percent-clipped=1.0 2024-08-20 02:37:18,016 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-20 02:37:21,751 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4631400.0, ans=0.1 2024-08-20 02:37:30,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4631400.0, ans=0.0 2024-08-20 02:37:44,192 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2024-08-20 02:37:56,545 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 23 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 02:37:56,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4631500.0, ans=0.125 2024-08-20 02:37:58,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4631500.0, ans=0.125 2024-08-20 02:38:24,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-20 02:38:47,582 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3800, loss[loss=0.1276, beats_loss=0.008387, ecapa_loss=0.0001332, whisper_loss=0.1179, over 14392.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.08906, over 3730861.91 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:38:53,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-20 02:39:17,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4631900.0, ans=0.125 2024-08-20 02:39:24,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4631900.0, ans=0.07 2024-08-20 02:39:36,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4632000.0, ans=0.1 2024-08-20 02:39:59,194 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 28 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-20 02:40:12,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4632100.0, ans=0.5 2024-08-20 02:40:24,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4632200.0, ans=0.1 2024-08-20 02:40:26,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2024-08-20 02:40:28,158 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 02:40:28,433 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4632200.0, ans=0.0 2024-08-20 02:40:40,662 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3850, loss[loss=0.0859, beats_loss=0.01226, ecapa_loss=0.0001622, whisper_loss=0.07202, over 16639.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.000142, whisper_loss=0.08926, over 3753232.01 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:40:55,854 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.447e+01 2.712e+01 3.131e+01 3.132e+02, threshold=5.425e+01, percent-clipped=6.0 2024-08-20 02:40:56,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4632300.0, ans=15.0 2024-08-20 02:41:02,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4632400.0, ans=0.2 2024-08-20 02:41:44,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4632600.0, ans=0.125 2024-08-20 02:42:10,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4632700.0, ans=0.125 2024-08-20 02:42:23,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-20 02:42:26,920 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3900, loss[loss=0.1144, beats_loss=0.0101, ecapa_loss=0.0001292, whisper_loss=0.103, over 18621.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.08936, over 3752563.03 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:42:27,158 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 02:42:32,968 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 02:42:35,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4632800.0, ans=0.125 2024-08-20 02:42:43,477 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:43:15,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=22.5 2024-08-20 02:43:27,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4633000.0, ans=0.05 2024-08-20 02:43:28,887 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-20 02:43:56,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4633200.0, ans=0.2 2024-08-20 02:43:59,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4633200.0, ans=0.1 2024-08-20 02:44:17,842 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 3950, loss[loss=0.1292, beats_loss=0.008385, ecapa_loss=0.0001325, whisper_loss=0.1195, over 24035.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.000142, whisper_loss=0.08972, over 3778511.96 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:44:33,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.316e+01 2.520e+01 2.771e+01 2.265e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 02:44:45,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4633400.0, ans=0.125 2024-08-20 02:44:49,818 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:44:50,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-20 02:44:55,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4633400.0, ans=0.125 2024-08-20 02:45:38,615 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 02:45:40,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4633600.0, ans=0.1 2024-08-20 02:45:48,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4633700.0, ans=0.125 2024-08-20 02:46:07,598 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 02:46:09,264 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4000, loss[loss=0.1022, beats_loss=0.009946, ecapa_loss=0.000148, whisper_loss=0.09082, over 21927.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001428, whisper_loss=0.09008, over 3798692.64 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:46:09,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4633800.0, ans=0.125 2024-08-20 02:46:32,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4633900.0, ans=0.125 2024-08-20 02:47:01,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4634000.0, ans=0.1 2024-08-20 02:47:39,360 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 02:47:46,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4634200.0, ans=0.0 2024-08-20 02:48:05,837 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4050, loss[loss=0.1054, beats_loss=0.01006, ecapa_loss=0.0001501, whisper_loss=0.09386, over 22499.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001422, whisper_loss=0.0907, over 3830212.13 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:48:22,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.303e+01 2.496e+01 2.881e+01 4.421e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-20 02:48:35,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4634400.0, ans=0.0 2024-08-20 02:48:42,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4634400.0, ans=0.125 2024-08-20 02:48:52,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4634500.0, ans=0.125 2024-08-20 02:49:22,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4634600.0, ans=0.125 2024-08-20 02:49:34,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4634600.0, ans=0.125 2024-08-20 02:49:43,172 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 02:49:45,431 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 14 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 02:49:52,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4634700.0, ans=0.125 2024-08-20 02:50:04,637 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 02:50:06,615 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4100, loss[loss=0.101, beats_loss=0.01189, ecapa_loss=0.0001285, whisper_loss=0.08785, over 18473.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.09036, over 3838456.18 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:50:13,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4634800.0, ans=0.125 2024-08-20 02:50:21,438 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 02:50:21,741 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4634800.0, ans=0.2 2024-08-20 02:50:53,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4635000.0, ans=0.1 2024-08-20 02:50:57,124 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 02:50:59,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4635000.0, ans=0.0 2024-08-20 02:51:01,710 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 02:51:27,640 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 02:52:01,059 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4150, loss[loss=0.1254, beats_loss=0.009968, ecapa_loss=0.0001605, whisper_loss=0.1138, over 21781.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001417, whisper_loss=0.09077, over 3871727.31 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:52:10,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4635300.0, ans=0.0 2024-08-20 02:52:16,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.376e+01 2.677e+01 2.991e+01 4.680e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-20 02:52:17,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-20 02:52:56,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4635500.0, ans=0.1 2024-08-20 02:53:11,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2024-08-20 02:53:30,269 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 02:53:30,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4635700.0, ans=0.125 2024-08-20 02:53:38,214 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-08-20 02:53:52,211 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4200, loss[loss=0.08209, beats_loss=0.01094, ecapa_loss=0.0001283, whisper_loss=0.06987, over 17494.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001419, whisper_loss=0.08995, over 3833341.78 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:53:57,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4635800.0, ans=0.1 2024-08-20 02:53:58,479 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 17 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 02:54:07,478 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 02:54:14,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4635900.0, ans=0.2 2024-08-20 02:54:42,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4636000.0, ans=0.95 2024-08-20 02:54:46,108 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-20 02:54:48,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4636000.0, ans=0.1 2024-08-20 02:55:10,864 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 02:55:18,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=12.0 2024-08-20 02:55:25,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4636200.0, ans=0.2 2024-08-20 02:55:36,805 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 02:55:43,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4636200.0, ans=0.0 2024-08-20 02:55:46,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4636300.0, ans=0.1 2024-08-20 02:55:48,360 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4250, loss[loss=0.09388, beats_loss=0.01027, ecapa_loss=0.0001418, whisper_loss=0.08219, over 22253.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001421, whisper_loss=0.08974, over 3830503.16 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:56:06,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.193e+01 2.444e+01 2.797e+01 4.359e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-20 02:56:13,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4636400.0, ans=0.0 2024-08-20 02:56:13,755 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2024-08-20 02:56:26,191 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 02:56:36,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4636500.0, ans=0.125 2024-08-20 02:56:56,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2024-08-20 02:57:01,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2024-08-20 02:57:11,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4636600.0, ans=0.0 2024-08-20 02:57:26,715 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-20 02:57:48,313 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4300, loss[loss=0.07645, beats_loss=0.009934, ecapa_loss=0.0001454, whisper_loss=0.06507, over 13025.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001414, whisper_loss=0.08961, over 3846693.60 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:58:02,983 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 02:58:05,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-20 02:58:24,790 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2024-08-20 02:59:25,293 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 02:59:35,796 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.63 vs. limit=22.5 2024-08-20 02:59:47,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4637200.0, ans=0.1 2024-08-20 02:59:52,019 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4350, loss[loss=0.09551, beats_loss=0.009141, ecapa_loss=0.0001369, whisper_loss=0.085, over 17380.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001401, whisper_loss=0.09027, over 3857992.01 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:59:59,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4637300.0, ans=0.125 2024-08-20 03:00:08,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.299e+01 2.481e+01 2.858e+01 4.859e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 03:00:14,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4637400.0, ans=0.125 2024-08-20 03:00:32,898 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 03:00:33,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4637400.0, ans=0.125 2024-08-20 03:00:46,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-20 03:00:53,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4637500.0, ans=0.125 2024-08-20 03:01:21,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4637600.0, ans=0.125 2024-08-20 03:01:40,135 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 03:01:40,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4637700.0, ans=0.2 2024-08-20 03:01:46,577 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 03:01:53,489 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4400, loss[loss=0.09343, beats_loss=0.01176, ecapa_loss=0.0001644, whisper_loss=0.08003, over 21819.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001408, whisper_loss=0.09037, over 3854658.58 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:02:38,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=12.0 2024-08-20 03:02:41,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4638000.0, ans=0.125 2024-08-20 03:02:43,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4638000.0, ans=0.0 2024-08-20 03:02:45,474 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 11 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 03:02:53,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4638000.0, ans=0.025 2024-08-20 03:03:15,106 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 03:03:18,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-20 03:03:20,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-20 03:03:36,099 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 03:03:56,299 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4450, loss[loss=0.1074, beats_loss=0.009194, ecapa_loss=0.0001603, whisper_loss=0.09662, over 22094.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.09052, over 3868813.71 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:04:12,967 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.659e+01 2.158e+01 2.452e+01 2.719e+01 3.768e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-20 03:04:18,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4638400.0, ans=0.125 2024-08-20 03:05:00,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-20 03:05:31,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=12.0 2024-08-20 03:05:48,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4638700.0, ans=0.125 2024-08-20 03:06:00,011 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4500, loss[loss=0.1136, beats_loss=0.007283, ecapa_loss=0.0001727, whisper_loss=0.1046, over 14529.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01029, ecapa_loss=0.0001417, whisper_loss=0.09078, over 3820278.66 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:06:11,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4638800.0, ans=0.125 2024-08-20 03:06:45,229 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-08-20 03:06:52,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2024-08-20 03:06:55,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4639000.0, ans=0.2 2024-08-20 03:07:29,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4639100.0, ans=0.2 2024-08-20 03:08:05,425 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4550, loss[loss=0.0778, beats_loss=0.01255, ecapa_loss=0.0001378, whisper_loss=0.06387, over 20140.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01029, ecapa_loss=0.0001408, whisper_loss=0.09019, over 3795505.07 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:08:23,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.329e+01 2.605e+01 2.856e+01 5.309e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 03:08:28,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4639400.0, ans=0.0 2024-08-20 03:08:50,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4639400.0, ans=0.0 2024-08-20 03:09:35,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4639600.0, ans=0.1 2024-08-20 03:10:13,197 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4600, loss[loss=0.09644, beats_loss=0.008931, ecapa_loss=0.000161, whisper_loss=0.0859, over 22185.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001404, whisper_loss=0.09029, over 3819231.89 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:10:41,140 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 03:10:51,824 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 03:10:56,758 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 03:11:13,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4640000.0, ans=0.0 2024-08-20 03:11:28,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4640000.0, ans=0.05 2024-08-20 03:12:06,192 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-20 03:12:24,829 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4650, loss[loss=0.1162, beats_loss=0.009953, ecapa_loss=0.0001421, whisper_loss=0.1048, over 16678.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001396, whisper_loss=0.09056, over 3819140.92 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:12:41,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.331e+01 2.446e+01 2.750e+01 3.848e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 03:13:26,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4640500.0, ans=0.0 2024-08-20 03:13:41,710 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 03:13:46,682 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-20 03:13:50,157 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-20 03:13:59,157 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 03:14:30,531 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4700, loss[loss=0.09564, beats_loss=0.01231, ecapa_loss=0.0001495, whisper_loss=0.08183, over 14480.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001408, whisper_loss=0.09053, over 3806344.49 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:14:40,932 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 03:16:22,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4641200.0, ans=0.0 2024-08-20 03:16:34,887 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4750, loss[loss=0.0931, beats_loss=0.01223, ecapa_loss=0.0001178, whisper_loss=0.07969, over 21670.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001406, whisper_loss=0.09055, over 3836990.70 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:16:45,901 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 03:16:48,359 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 14 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 03:16:50,657 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 03:16:53,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.351e+01 2.626e+01 2.955e+01 4.641e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-20 03:16:57,265 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 03:17:03,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4641400.0, ans=0.07 2024-08-20 03:17:08,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4641400.0, ans=0.07 2024-08-20 03:17:10,119 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 03:17:10,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4641400.0, ans=0.125 2024-08-20 03:17:42,127 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 35 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 03:17:42,346 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4641500.0, ans=0.05 2024-08-20 03:18:02,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4641600.0, ans=0.2 2024-08-20 03:18:07,827 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4641600.0, ans=0.1 2024-08-20 03:18:16,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4641700.0, ans=0.125 2024-08-20 03:18:23,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4641700.0, ans=0.125 2024-08-20 03:18:26,294 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 32 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 03:18:40,913 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4800, loss[loss=0.08593, beats_loss=0.01076, ecapa_loss=0.0001314, whisper_loss=0.07385, over 19596.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.09008, over 3814492.39 frames. ], batch size: 79, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:18:43,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4641800.0, ans=0.125 2024-08-20 03:19:06,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4641900.0, ans=0.1 2024-08-20 03:19:39,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4642000.0, ans=0.125 2024-08-20 03:20:29,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4642200.0, ans=0.125 2024-08-20 03:20:37,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4642200.0, ans=0.125 2024-08-20 03:20:46,511 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4850, loss[loss=0.09508, beats_loss=0.01304, ecapa_loss=0.0001208, whisper_loss=0.08083, over 21279.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001405, whisper_loss=0.09049, over 3826690.42 frames. ], batch size: 86, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:21:02,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.316e+01 2.589e+01 3.055e+01 7.163e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-20 03:21:21,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4642400.0, ans=0.1 2024-08-20 03:21:28,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4642400.0, ans=0.125 2024-08-20 03:21:39,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4642500.0, ans=0.0 2024-08-20 03:22:00,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4642600.0, ans=0.125 2024-08-20 03:22:06,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-20 03:22:18,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4642700.0, ans=0.125 2024-08-20 03:22:21,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4642700.0, ans=0.0 2024-08-20 03:22:27,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2024-08-20 03:22:35,258 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4900, loss[loss=0.1034, beats_loss=0.01005, ecapa_loss=0.0001711, whisper_loss=0.09166, over 17482.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.0904, over 3860778.14 frames. ], batch size: 76, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:22:38,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4642800.0, ans=0.0 2024-08-20 03:22:46,444 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 03:22:56,609 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 03:22:56,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4642900.0, ans=0.125 2024-08-20 03:23:52,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4643100.0, ans=0.0 2024-08-20 03:23:56,983 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.01 vs. limit=6.0 2024-08-20 03:23:58,287 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 03:24:01,298 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.70 vs. limit=6.0 2024-08-20 03:24:09,128 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 03:24:11,154 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 42 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 03:24:20,431 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 4950, loss[loss=0.1373, beats_loss=0.008607, ecapa_loss=0.0001399, whisper_loss=0.1273, over 22285.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01037, ecapa_loss=0.0001407, whisper_loss=0.0914, over 3869364.08 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 1.152921504606847e+18 2024-08-20 03:24:20,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4643300.0, ans=0.0 2024-08-20 03:24:22,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4643300.0, ans=0.125 2024-08-20 03:24:34,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.306e+01 2.561e+01 2.855e+01 3.879e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-20 03:24:34,996 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 03:25:23,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4643600.0, ans=10.0 2024-08-20 03:25:53,835 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 03:25:55,357 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5000, loss[loss=0.09731, beats_loss=0.01206, ecapa_loss=0.0001417, whisper_loss=0.08383, over 21685.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.09043, over 3849061.99 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:26:06,437 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 03:26:06,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4643800.0, ans=0.125 2024-08-20 03:26:15,795 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 22 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 03:26:22,336 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-20 03:26:27,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4643900.0, ans=0.1 2024-08-20 03:26:35,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4644000.0, ans=0.0 2024-08-20 03:26:35,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4644000.0, ans=0.125 2024-08-20 03:26:54,957 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 03:27:13,154 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 03:27:16,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4644200.0, ans=0.025 2024-08-20 03:27:27,630 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5050, loss[loss=0.1157, beats_loss=0.008421, ecapa_loss=0.0001579, whisper_loss=0.1057, over 21191.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001402, whisper_loss=0.08962, over 3805735.87 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:27:44,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.280e+01 2.515e+01 2.844e+01 3.725e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-20 03:27:44,553 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 03:27:51,928 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-20 03:27:52,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4644400.0, ans=0.2 2024-08-20 03:27:58,497 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 03:28:00,468 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 03:28:16,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4644500.0, ans=0.1 2024-08-20 03:28:20,363 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 03:28:23,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4644600.0, ans=0.125 2024-08-20 03:28:26,121 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-20 03:28:34,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4644600.0, ans=0.0 2024-08-20 03:28:36,531 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 03:28:57,058 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5100, loss[loss=0.1069, beats_loss=0.01122, ecapa_loss=0.0001202, whisper_loss=0.09448, over 18350.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.09034, over 3787584.84 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:29:07,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4644800.0, ans=0.0 2024-08-20 03:29:17,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4644900.0, ans=0.125 2024-08-20 03:29:38,931 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 29 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 03:29:54,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-20 03:30:12,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4645200.0, ans=0.0 2024-08-20 03:30:18,117 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 03:30:23,573 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 03:30:25,184 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=12.0 2024-08-20 03:30:27,099 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5150, loss[loss=0.1082, beats_loss=0.01013, ecapa_loss=0.0001511, whisper_loss=0.09659, over 23184.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001404, whisper_loss=0.09063, over 3787881.17 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:30:37,857 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 03:30:42,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.233e+01 2.389e+01 2.694e+01 3.675e+01, threshold=4.778e+01, percent-clipped=0.0 2024-08-20 03:30:51,684 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 03:31:02,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4645500.0, ans=0.1 2024-08-20 03:31:10,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4645500.0, ans=0.125 2024-08-20 03:31:18,418 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 03:31:20,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4645600.0, ans=0.1 2024-08-20 03:31:48,086 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 03:31:54,533 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5200, loss[loss=0.1072, beats_loss=0.01033, ecapa_loss=0.0001213, whisper_loss=0.09566, over 23125.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.09085, over 3802959.72 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:32:08,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4645800.0, ans=0.5 2024-08-20 03:32:18,054 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 03:32:28,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4646000.0, ans=0.0 2024-08-20 03:32:42,730 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 9 from Vox, 44 fro AS 2024-08-20 03:32:49,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=22.5 2024-08-20 03:32:50,987 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 03:33:17,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4646200.0, ans=0.0 2024-08-20 03:33:24,349 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5250, loss[loss=0.1044, beats_loss=0.009627, ecapa_loss=0.0001312, whisper_loss=0.09344, over 22098.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001398, whisper_loss=0.09017, over 3787244.46 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:33:33,727 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 03:33:37,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4646300.0, ans=0.0 2024-08-20 03:33:40,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.321e+01 2.600e+01 2.824e+01 7.148e+01, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 03:33:48,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4646400.0, ans=0.0 2024-08-20 03:33:49,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4646400.0, ans=0.2 2024-08-20 03:34:13,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4646500.0, ans=0.2 2024-08-20 03:34:18,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4646600.0, ans=0.0 2024-08-20 03:34:55,852 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5300, loss[loss=0.1012, beats_loss=0.01019, ecapa_loss=9.032e-05, whisper_loss=0.09013, over 14523.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001389, whisper_loss=0.09014, over 3765930.81 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:34:58,303 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 03:35:02,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.00 vs. limit=10.0 2024-08-20 03:35:12,094 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 03:35:45,529 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-08-20 03:36:36,420 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5350, loss[loss=0.1098, beats_loss=0.007305, ecapa_loss=0.0002385, whisper_loss=0.1001, over 16190.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001399, whisper_loss=0.08998, over 3765658.68 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:36:49,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4647300.0, ans=0.125 2024-08-20 03:36:53,552 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 33 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 03:36:57,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.184e+01 2.426e+01 2.687e+01 4.168e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 03:37:37,628 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 03:38:04,159 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 03:38:35,821 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5400, loss[loss=0.1098, beats_loss=0.01015, ecapa_loss=0.0001353, whisper_loss=0.09835, over 21594.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.09078, over 3803201.11 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:39:03,619 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 03:39:08,171 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-20 03:39:25,961 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 14 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 03:39:26,349 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:39:45,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4648100.0, ans=0.0 2024-08-20 03:40:06,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2024-08-20 03:40:28,659 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5450, loss[loss=0.09853, beats_loss=0.01139, ecapa_loss=0.0001564, whisper_loss=0.08557, over 16136.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0103, ecapa_loss=0.00014, whisper_loss=0.09107, over 3783983.48 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:40:45,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.272e+01 2.507e+01 2.790e+01 3.633e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-20 03:41:04,634 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 14 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 03:41:22,578 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.95 vs. limit=10.0 2024-08-20 03:41:24,855 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 03:41:25,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4648500.0, ans=0.1 2024-08-20 03:41:29,549 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 03:41:34,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4648600.0, ans=0.0 2024-08-20 03:41:55,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4648700.0, ans=0.125 2024-08-20 03:42:18,161 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5500, loss[loss=0.1189, beats_loss=0.01115, ecapa_loss=0.0001341, whisper_loss=0.1064, over 23469.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001406, whisper_loss=0.09068, over 3795035.14 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:42:24,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4648800.0, ans=0.1 2024-08-20 03:42:50,470 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 03:43:10,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4649000.0, ans=0.125 2024-08-20 03:43:32,378 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 13 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 03:43:39,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4649100.0, ans=0.05 2024-08-20 03:43:39,561 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-20 03:43:44,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4649100.0, ans=0.125 2024-08-20 03:43:49,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4649200.0, ans=0.0 2024-08-20 03:44:00,348 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 32 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 03:44:11,985 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5550, loss[loss=0.1045, beats_loss=0.008632, ecapa_loss=0.0001351, whisper_loss=0.09456, over 14098.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.09008, over 3810741.93 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:44:12,239 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 03:44:14,235 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 03:44:19,101 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 03:44:21,845 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 03:44:32,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4649300.0, ans=0.1 2024-08-20 03:44:35,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.290e+01 2.579e+01 2.821e+01 2.823e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-20 03:44:43,223 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 03:44:48,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4649400.0, ans=0.0 2024-08-20 03:44:52,412 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 03:44:52,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4649400.0, ans=0.125 2024-08-20 03:44:57,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4649400.0, ans=0.025 2024-08-20 03:44:59,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4649500.0, ans=0.0 2024-08-20 03:45:27,340 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 03:45:35,467 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 03:46:11,397 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5600, loss[loss=0.1055, beats_loss=0.01186, ecapa_loss=0.0001511, whisper_loss=0.09215, over 19851.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001392, whisper_loss=0.09038, over 3837489.60 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:46:24,367 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 03:46:27,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4649800.0, ans=0.125 2024-08-20 03:47:18,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2024-08-20 03:47:59,304 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5650, loss[loss=0.1089, beats_loss=0.01027, ecapa_loss=0.0001557, whisper_loss=0.09708, over 21795.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001408, whisper_loss=0.09117, over 3843259.44 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:48:11,793 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 03:48:19,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.03 vs. limit=6.0 2024-08-20 03:48:20,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.429e+01 2.607e+01 2.937e+01 4.534e+02, threshold=5.214e+01, percent-clipped=3.0 2024-08-20 03:48:56,270 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 03:49:01,130 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-20 03:49:12,076 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 03:49:40,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4650700.0, ans=0.2 2024-08-20 03:49:52,869 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.709e-01 2024-08-20 03:49:54,588 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5700, loss[loss=0.1142, beats_loss=0.007444, ecapa_loss=0.0001295, whisper_loss=0.1055, over 14812.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01028, ecapa_loss=0.0001419, whisper_loss=0.09158, over 3812098.97 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:50:19,485 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 03:50:56,079 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2024-08-20 03:51:06,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4651100.0, ans=0.025 2024-08-20 03:51:11,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4651100.0, ans=0.0 2024-08-20 03:51:33,568 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-08-20 03:51:41,478 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5750, loss[loss=0.08872, beats_loss=0.01044, ecapa_loss=0.0001435, whisper_loss=0.07685, over 18628.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001429, whisper_loss=0.09098, over 3815816.27 frames. ], batch size: 77, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:51:41,719 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-20 03:51:43,437 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-20 03:51:55,022 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 14 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 03:52:01,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.309e+01 2.653e+01 2.956e+01 1.340e+02, threshold=5.306e+01, percent-clipped=1.0 2024-08-20 03:52:13,266 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 03:52:49,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-08-20 03:52:55,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4651600.0, ans=0.0 2024-08-20 03:53:00,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-20 03:53:07,107 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-20 03:53:30,705 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5800, loss[loss=0.1248, beats_loss=0.008777, ecapa_loss=0.0001273, whisper_loss=0.1148, over 18381.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001415, whisper_loss=0.09063, over 3840822.89 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:53:32,833 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 03:53:35,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4651800.0, ans=0.0 2024-08-20 03:53:40,033 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-20 03:53:40,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-20 03:53:46,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4651800.0, ans=0.1 2024-08-20 03:54:18,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4652000.0, ans=0.125 2024-08-20 03:54:51,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4652100.0, ans=0.125 2024-08-20 03:54:53,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4652200.0, ans=0.125 2024-08-20 03:54:53,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4652200.0, ans=0.5 2024-08-20 03:55:14,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4652300.0, ans=0.1 2024-08-20 03:55:15,721 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5850, loss[loss=0.0741, beats_loss=0.01275, ecapa_loss=0.0001101, whisper_loss=0.06025, over 17910.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.000142, whisper_loss=0.09044, over 3819699.88 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:55:16,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=12.0 2024-08-20 03:55:22,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4652300.0, ans=0.125 2024-08-20 03:55:34,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.203e+01 2.512e+01 2.750e+01 3.616e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-20 03:55:36,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4652400.0, ans=0.0 2024-08-20 03:55:38,660 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 03:55:40,719 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 03:56:18,623 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 03:56:29,900 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 03:56:36,555 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 33 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 03:57:05,954 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5900, loss[loss=0.08037, beats_loss=0.01194, ecapa_loss=9.851e-05, whisper_loss=0.06744, over 16843.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001415, whisper_loss=0.08942, over 3772101.46 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:57:33,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4652900.0, ans=0.1 2024-08-20 03:57:46,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.30 vs. limit=10.0 2024-08-20 03:58:12,112 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 03:58:55,817 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 03:58:59,959 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 5950, loss[loss=0.1088, beats_loss=0.01013, ecapa_loss=0.0001226, whisper_loss=0.09743, over 21170.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001413, whisper_loss=0.08972, over 3800006.37 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:59:14,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-20 03:59:21,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.326e+01 2.621e+01 2.901e+01 3.816e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 03:59:32,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-08-20 03:59:40,253 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-20 03:59:42,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4653500.0, ans=0.0 2024-08-20 03:59:44,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4653500.0, ans=0.125 2024-08-20 04:00:06,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4653600.0, ans=0.125 2024-08-20 04:00:49,264 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6000, loss[loss=0.09352, beats_loss=0.009262, ecapa_loss=0.0001543, whisper_loss=0.08272, over 16653.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.000142, whisper_loss=0.08992, over 3792758.23 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:00:49,264 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 04:01:25,923 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005122, whisper_loss=0.2485, over 931116.00 frames. 2024-08-20 04:01:50,421 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-20 04:03:25,286 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 04:03:25,290 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 04:03:52,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4653900.0, ans=0.2 2024-08-20 04:03:55,916 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 04:04:28,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4654100.0, ans=0.0 2024-08-20 04:04:52,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4654200.0, ans=0.1 2024-08-20 04:04:54,626 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6050, loss[loss=0.1174, beats_loss=0.009419, ecapa_loss=0.000138, whisper_loss=0.1066, over 23014.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.08903, over 3776453.13 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:05:04,794 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 04:05:09,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.277e+01 2.536e+01 2.822e+01 4.959e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 04:05:09,628 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 04:05:51,166 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=12.0 2024-08-20 04:05:59,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4654600.0, ans=0.1 2024-08-20 04:06:05,169 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 04:06:11,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4654700.0, ans=0.125 2024-08-20 04:06:22,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4654800.0, ans=0.0 2024-08-20 04:06:24,003 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6100, loss[loss=0.1067, beats_loss=0.01135, ecapa_loss=0.0001232, whisper_loss=0.09407, over 20060.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.000141, whisper_loss=0.08864, over 3779494.93 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:06:24,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4654800.0, ans=0.1 2024-08-20 04:07:02,373 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 04:07:36,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4655100.0, ans=0.125 2024-08-20 04:07:58,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4655200.0, ans=0.0 2024-08-20 04:08:11,816 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6150, loss[loss=0.0821, beats_loss=0.0129, ecapa_loss=0.0001525, whisper_loss=0.06767, over 20075.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.00014, whisper_loss=0.08938, over 3781586.75 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:08:12,009 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 04:08:12,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4655300.0, ans=0.125 2024-08-20 04:08:26,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4655300.0, ans=15.0 2024-08-20 04:08:31,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.289e+01 2.520e+01 2.857e+01 4.942e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 04:09:21,622 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 04:09:26,043 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 04:10:01,755 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6200, loss[loss=0.08979, beats_loss=0.009293, ecapa_loss=0.0001781, whisper_loss=0.07871, over 20230.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001393, whisper_loss=0.09005, over 3800084.25 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:10:35,193 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:10:50,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4656000.0, ans=10.0 2024-08-20 04:10:50,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.96 vs. limit=10.0 2024-08-20 04:11:25,772 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 04:11:26,462 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.200e+05 2024-08-20 04:11:46,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4656200.0, ans=0.125 2024-08-20 04:11:50,490 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6250, loss[loss=0.08411, beats_loss=0.007369, ecapa_loss=0.0002256, whisper_loss=0.07449, over 16302.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001396, whisper_loss=0.09054, over 3816826.45 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:11:57,204 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 04:11:59,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4656300.0, ans=0.1 2024-08-20 04:12:09,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.244e+01 2.486e+01 2.895e+01 5.036e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-20 04:12:59,837 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 04:13:06,622 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-20 04:13:41,014 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6300, loss[loss=0.1095, beats_loss=0.007425, ecapa_loss=0.0001518, whisper_loss=0.1005, over 22238.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.08992, over 3835513.44 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:13:46,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4656800.0, ans=0.2 2024-08-20 04:14:02,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4656900.0, ans=0.0 2024-08-20 04:14:11,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4656900.0, ans=0.0 2024-08-20 04:14:16,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4656900.0, ans=0.0 2024-08-20 04:14:21,249 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 04:14:31,186 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 04:14:57,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4657100.0, ans=10.0 2024-08-20 04:15:11,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4657200.0, ans=0.125 2024-08-20 04:15:19,973 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 04:15:36,370 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6350, loss[loss=0.1091, beats_loss=0.009981, ecapa_loss=0.0001503, whisper_loss=0.09764, over 16125.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.0896, over 3859105.87 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:15:41,305 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 04:15:52,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4657300.0, ans=0.0 2024-08-20 04:15:56,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.231e+01 2.542e+01 2.829e+01 6.825e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-20 04:16:03,500 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 04:16:08,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4657400.0, ans=0.5 2024-08-20 04:16:17,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4657400.0, ans=0.0 2024-08-20 04:16:21,070 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 04:16:35,643 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 04:17:26,689 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6400, loss[loss=0.1011, beats_loss=0.01157, ecapa_loss=0.0001527, whisper_loss=0.08798, over 21373.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.08955, over 3872703.33 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:18:05,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4657900.0, ans=0.0 2024-08-20 04:18:11,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4658000.0, ans=0.0 2024-08-20 04:18:27,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4658000.0, ans=0.125 2024-08-20 04:18:32,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-20 04:18:40,866 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 04:18:50,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4658100.0, ans=0.2 2024-08-20 04:18:59,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4658200.0, ans=0.0 2024-08-20 04:19:01,306 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 04:19:09,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4658200.0, ans=0.125 2024-08-20 04:19:18,224 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6450, loss[loss=0.09886, beats_loss=0.01263, ecapa_loss=0.0001246, whisper_loss=0.08498, over 22876.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001422, whisper_loss=0.08958, over 3817380.57 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:19:38,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.209e+01 2.444e+01 2.735e+01 9.511e+01, threshold=4.888e+01, percent-clipped=1.0 2024-08-20 04:19:39,258 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 04:19:44,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-08-20 04:19:45,854 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 04:19:52,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4658400.0, ans=0.125 2024-08-20 04:19:54,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-20 04:19:58,761 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 04:20:21,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4658500.0, ans=0.125 2024-08-20 04:21:11,435 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6500, loss[loss=0.1043, beats_loss=0.008338, ecapa_loss=0.0001447, whisper_loss=0.09455, over 17495.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001415, whisper_loss=0.08913, over 3821497.18 frames. ], batch size: 67, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:21:14,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4658800.0, ans=0.125 2024-08-20 04:21:26,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4658800.0, ans=0.1 2024-08-20 04:21:39,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4658900.0, ans=0.1 2024-08-20 04:21:52,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2024-08-20 04:22:01,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=12.0 2024-08-20 04:22:03,623 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 04:22:04,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2024-08-20 04:22:13,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4659100.0, ans=0.1 2024-08-20 04:22:19,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4659100.0, ans=0.0 2024-08-20 04:23:02,357 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6550, loss[loss=0.1042, beats_loss=0.008499, ecapa_loss=0.0001718, whisper_loss=0.09393, over 21200.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.08981, over 3831893.99 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:23:23,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.308e+01 2.565e+01 2.877e+01 4.180e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 04:23:24,237 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 04:23:34,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4659400.0, ans=0.125 2024-08-20 04:23:43,918 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 04:24:13,083 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 04:24:15,781 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:24:38,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4659700.0, ans=0.125 2024-08-20 04:24:49,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4659700.0, ans=0.2 2024-08-20 04:24:54,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4659700.0, ans=0.2 2024-08-20 04:25:01,160 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6600, loss[loss=0.1112, beats_loss=0.009439, ecapa_loss=0.0001225, whisper_loss=0.1006, over 18336.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001411, whisper_loss=0.09059, over 3848292.44 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:25:12,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2024-08-20 04:25:18,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4659800.0, ans=0.025 2024-08-20 04:25:20,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-20 04:25:29,441 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 04:25:43,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4660000.0, ans=0.1 2024-08-20 04:25:43,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4660000.0, ans=0.125 2024-08-20 04:26:05,471 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 04:26:19,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4660100.0, ans=10.0 2024-08-20 04:26:36,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4660200.0, ans=15.0 2024-08-20 04:26:52,837 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6650, loss[loss=0.1056, beats_loss=0.01133, ecapa_loss=0.0001248, whisper_loss=0.09304, over 21970.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.09076, over 3863026.74 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:26:55,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4660300.0, ans=0.0 2024-08-20 04:27:14,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.439e+01 2.716e+01 3.206e+01 5.057e+01, threshold=5.432e+01, percent-clipped=0.0 2024-08-20 04:27:21,560 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 04:28:05,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4660600.0, ans=0.0 2024-08-20 04:28:10,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4660600.0, ans=0.125 2024-08-20 04:28:20,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4660600.0, ans=0.125 2024-08-20 04:28:31,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4660700.0, ans=0.125 2024-08-20 04:28:35,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4660700.0, ans=0.125 2024-08-20 04:28:52,036 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6700, loss[loss=0.1063, beats_loss=0.01078, ecapa_loss=0.0001366, whisper_loss=0.09419, over 22795.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.09027, over 3891050.01 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:29:12,690 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 04:29:15,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4660900.0, ans=0.0 2024-08-20 04:29:25,640 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 24 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 04:30:01,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4661100.0, ans=0.125 2024-08-20 04:30:01,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-20 04:30:19,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4661100.0, ans=0.2 2024-08-20 04:30:26,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4661200.0, ans=0.125 2024-08-20 04:30:30,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4661200.0, ans=0.125 2024-08-20 04:30:49,565 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6750, loss[loss=0.1054, beats_loss=0.01011, ecapa_loss=0.0001572, whisper_loss=0.0937, over 12696.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001408, whisper_loss=0.09044, over 3884077.09 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:30:58,500 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 04:31:06,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-20 04:31:08,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.262e+01 2.503e+01 2.805e+01 3.998e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 04:31:18,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4661400.0, ans=0.125 2024-08-20 04:31:27,966 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 04:31:32,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4661500.0, ans=0.09899494936611666 2024-08-20 04:31:35,429 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 23 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-20 04:31:52,349 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 04:31:59,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4661600.0, ans=0.0 2024-08-20 04:32:16,487 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 04:32:26,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4661700.0, ans=0.0 2024-08-20 04:32:33,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2024-08-20 04:32:41,868 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6800, loss[loss=0.1099, beats_loss=0.009546, ecapa_loss=0.0001759, whisper_loss=0.0986, over 22842.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001409, whisper_loss=0.09025, over 3885530.38 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:32:58,396 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 04:33:01,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4661800.0, ans=0.1 2024-08-20 04:33:36,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4662000.0, ans=0.125 2024-08-20 04:33:42,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4662000.0, ans=0.2 2024-08-20 04:33:46,013 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.26 vs. limit=5.0 2024-08-20 04:33:51,486 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 04:34:35,303 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6850, loss[loss=0.09725, beats_loss=0.01075, ecapa_loss=0.0001371, whisper_loss=0.08512, over 21954.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.09011, over 3849736.36 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:34:38,553 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 04:34:45,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4662300.0, ans=0.125 2024-08-20 04:34:55,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.273e+01 2.508e+01 2.881e+01 4.383e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 04:35:04,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4662400.0, ans=0.125 2024-08-20 04:35:04,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-20 04:35:11,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4662400.0, ans=0.0 2024-08-20 04:35:40,587 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 04:35:44,753 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 04:35:52,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4662600.0, ans=0.09899494936611666 2024-08-20 04:36:03,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4662700.0, ans=0.125 2024-08-20 04:36:08,449 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 04:36:12,294 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 04:36:24,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4662700.0, ans=0.0 2024-08-20 04:36:27,890 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6900, loss[loss=0.09341, beats_loss=0.01102, ecapa_loss=0.0001195, whisper_loss=0.0812, over 22542.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001412, whisper_loss=0.09019, over 3864568.30 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:36:51,930 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 14 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 04:37:15,321 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 04:37:21,041 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 04:37:21,486 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=22.5 2024-08-20 04:37:39,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-20 04:38:06,097 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 22 from LS+wenet, 37 from Vox, 33 fro AS 2024-08-20 04:38:14,710 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 6950, loss[loss=0.1287, beats_loss=0.008679, ecapa_loss=0.0001544, whisper_loss=0.1185, over 22675.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001422, whisper_loss=0.09059, over 3846624.36 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:38:35,735 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.402e+01 2.667e+01 2.923e+01 3.663e+02, threshold=5.334e+01, percent-clipped=2.0 2024-08-20 04:38:38,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4663400.0, ans=0.04949747468305833 2024-08-20 04:38:44,372 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 04:38:46,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4663400.0, ans=0.125 2024-08-20 04:39:31,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4663600.0, ans=0.1 2024-08-20 04:39:52,708 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 12 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-20 04:39:58,160 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7000, loss[loss=0.1122, beats_loss=0.009158, ecapa_loss=0.0001577, whisper_loss=0.1015, over 20181.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.09041, over 3811533.83 frames. ], batch size: 83, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:40:02,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4663800.0, ans=0.0 2024-08-20 04:40:35,375 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 04:40:43,388 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2024-08-20 04:40:45,215 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-20 04:40:46,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4664000.0, ans=0.0 2024-08-20 04:40:48,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4664000.0, ans=0.0 2024-08-20 04:40:55,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4664100.0, ans=0.2 2024-08-20 04:40:57,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4664100.0, ans=0.2 2024-08-20 04:40:58,917 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 04:41:18,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4664200.0, ans=0.1 2024-08-20 04:41:31,593 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7050, loss[loss=0.09931, beats_loss=0.0118, ecapa_loss=0.0001423, whisper_loss=0.08609, over 22043.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.09083, over 3827611.65 frames. ], batch size: 94, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:41:47,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.321e+01 2.580e+01 2.916e+01 2.806e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-20 04:42:27,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4664600.0, ans=0.2 2024-08-20 04:43:05,567 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7100, loss[loss=0.09508, beats_loss=0.01039, ecapa_loss=0.0001718, whisper_loss=0.08298, over 20461.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001414, whisper_loss=0.09084, over 3821761.57 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:43:30,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4664900.0, ans=0.2 2024-08-20 04:43:51,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4665000.0, ans=0.125 2024-08-20 04:43:54,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4665000.0, ans=0.125 2024-08-20 04:43:57,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4665000.0, ans=0.04949747468305833 2024-08-20 04:43:59,255 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-20 04:44:02,914 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-20 04:44:06,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4665000.0, ans=0.0 2024-08-20 04:44:21,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4665100.0, ans=0.125 2024-08-20 04:44:43,467 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 04:44:50,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4665200.0, ans=0.125 2024-08-20 04:44:57,406 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7150, loss[loss=0.1042, beats_loss=0.009539, ecapa_loss=0.0001487, whisper_loss=0.09319, over 20154.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001419, whisper_loss=0.09093, over 3856947.49 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:45:17,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.230e+01 2.408e+01 2.713e+01 4.387e+01, threshold=4.817e+01, percent-clipped=0.0 2024-08-20 04:45:32,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4665400.0, ans=0.1 2024-08-20 04:45:34,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4665400.0, ans=0.0 2024-08-20 04:45:51,268 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-08-20 04:46:14,827 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-20 04:46:52,135 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7200, loss[loss=0.0782, beats_loss=0.01108, ecapa_loss=0.000154, whisper_loss=0.06558, over 18886.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001415, whisper_loss=0.09021, over 3861491.62 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:47:05,712 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-20 04:47:07,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4665800.0, ans=0.0 2024-08-20 04:47:16,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4665900.0, ans=0.1 2024-08-20 04:47:37,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4666000.0, ans=0.125 2024-08-20 04:47:40,967 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 04:47:56,191 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 04:48:18,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4666100.0, ans=0.1 2024-08-20 04:48:44,272 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7250, loss[loss=0.08864, beats_loss=0.009439, ecapa_loss=0.0002056, whisper_loss=0.07715, over 18975.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001409, whisper_loss=0.09, over 3843055.36 frames. ], batch size: 85, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:48:46,479 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-20 04:48:57,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4666300.0, ans=0.125 2024-08-20 04:48:59,495 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 04:49:04,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.278e+01 2.449e+01 2.713e+01 3.965e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:49:20,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-20 04:49:22,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-20 04:49:36,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4666500.0, ans=0.035 2024-08-20 04:50:19,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4666700.0, ans=0.125 2024-08-20 04:50:25,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4666700.0, ans=0.125 2024-08-20 04:50:33,852 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7300, loss[loss=0.1078, beats_loss=0.01071, ecapa_loss=0.0001416, whisper_loss=0.09568, over 21280.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001404, whisper_loss=0.09015, over 3865943.23 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:51:11,260 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 18 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 04:51:28,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4667000.0, ans=0.025 2024-08-20 04:51:35,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4667000.0, ans=0.125 2024-08-20 04:52:09,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4667200.0, ans=0.125 2024-08-20 04:52:20,998 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 04:52:29,459 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7350, loss[loss=0.09437, beats_loss=0.01124, ecapa_loss=0.0001196, whisper_loss=0.08193, over 20890.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001417, whisper_loss=0.09066, over 3853065.16 frames. ], batch size: 83, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:52:50,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.449e+01 2.717e+01 4.858e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:53:03,172 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 04:53:21,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4667500.0, ans=0.0 2024-08-20 04:53:46,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4667600.0, ans=0.0 2024-08-20 04:53:48,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4667600.0, ans=0.1 2024-08-20 04:54:19,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4667800.0, ans=0.0 2024-08-20 04:54:20,628 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7400, loss[loss=0.1058, beats_loss=0.01249, ecapa_loss=0.000117, whisper_loss=0.09215, over 15996.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.09101, over 3873966.87 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:54:38,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4667800.0, ans=0.125 2024-08-20 04:54:39,616 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 37 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 04:54:49,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4667900.0, ans=0.125 2024-08-20 04:55:07,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4668000.0, ans=0.0 2024-08-20 04:55:19,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4668000.0, ans=0.125 2024-08-20 04:55:27,221 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-20 04:55:43,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4668100.0, ans=0.125 2024-08-20 04:56:04,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4668200.0, ans=0.04949747468305833 2024-08-20 04:56:18,303 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7450, loss[loss=0.09909, beats_loss=0.01068, ecapa_loss=0.0001392, whisper_loss=0.08701, over 23109.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01034, ecapa_loss=0.0001418, whisper_loss=0.09151, over 3879042.48 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:56:21,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4668300.0, ans=0.2 2024-08-20 04:56:32,819 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-20 04:56:39,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.202e+01 2.465e+01 2.731e+01 3.799e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-20 04:56:51,931 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 04:56:59,089 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 04:58:08,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4668700.0, ans=0.125 2024-08-20 04:58:12,396 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7500, loss[loss=0.1252, beats_loss=0.007938, ecapa_loss=0.0001582, whisper_loss=0.1157, over 14467.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001417, whisper_loss=0.09068, over 3845935.94 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:58:21,462 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-20 04:58:28,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4668800.0, ans=0.125 2024-08-20 04:59:05,164 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 04:59:07,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4669000.0, ans=10.0 2024-08-20 04:59:16,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4669100.0, ans=0.125 2024-08-20 04:59:24,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4669100.0, ans=0.125 2024-08-20 04:59:36,315 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-20 04:59:43,262 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 32 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 04:59:46,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4669200.0, ans=0.125 2024-08-20 04:59:52,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4669200.0, ans=0.015 2024-08-20 05:00:05,034 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7550, loss[loss=0.1231, beats_loss=0.009401, ecapa_loss=0.0001472, whisper_loss=0.1123, over 22739.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.09082, over 3816959.48 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:00:05,232 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 05:00:08,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-20 05:00:13,541 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 05:00:22,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.290e+01 2.609e+01 3.060e+01 2.674e+02, threshold=5.218e+01, percent-clipped=1.0 2024-08-20 05:00:25,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4669400.0, ans=0.1 2024-08-20 05:00:27,428 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 05:00:27,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4669400.0, ans=0.125 2024-08-20 05:00:42,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=4669400.0, ans=0.2 2024-08-20 05:00:42,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4669400.0, ans=0.125 2024-08-20 05:00:51,256 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 29 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 05:01:34,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4669700.0, ans=0.1 2024-08-20 05:01:57,853 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7600, loss[loss=0.06861, beats_loss=0.01072, ecapa_loss=0.0001581, whisper_loss=0.05631, over 13681.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001408, whisper_loss=0.0903, over 3794299.84 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:02:04,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4669800.0, ans=0.1 2024-08-20 05:02:09,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4669800.0, ans=0.0 2024-08-20 05:03:01,530 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2024-08-20 05:03:14,071 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 05:03:38,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4670200.0, ans=0.125 2024-08-20 05:03:48,110 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7650, loss[loss=0.08044, beats_loss=0.01388, ecapa_loss=0.0001157, whisper_loss=0.0654, over 15630.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.09006, over 3775904.50 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:04:08,386 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.328e+01 2.537e+01 2.838e+01 5.582e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-20 05:04:13,038 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-08-20 05:04:14,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4670400.0, ans=0.95 2024-08-20 05:05:21,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4670700.0, ans=0.0 2024-08-20 05:05:32,574 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7700, loss[loss=0.08496, beats_loss=0.01041, ecapa_loss=0.0001335, whisper_loss=0.07321, over 13063.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001419, whisper_loss=0.08996, over 3767286.83 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:05:49,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4670800.0, ans=0.0 2024-08-20 05:05:52,525 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-08-20 05:05:56,767 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4670900.0, ans=0.125 2024-08-20 05:06:16,550 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 05:06:21,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4671000.0, ans=0.0 2024-08-20 05:06:36,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4671000.0, ans=0.1 2024-08-20 05:07:00,938 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 05:07:24,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-20 05:07:28,234 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7750, loss[loss=0.1114, beats_loss=0.01053, ecapa_loss=0.0001344, whisper_loss=0.09953, over 22556.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001415, whisper_loss=0.09088, over 3824704.04 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:07:33,485 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 16 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 05:07:39,992 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 05:07:49,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.264e+01 2.430e+01 2.732e+01 4.233e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-20 05:09:31,019 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7800, loss[loss=0.09042, beats_loss=0.006665, ecapa_loss=0.0001851, whisper_loss=0.0819, over 13050.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.09022, over 3804907.69 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:09:44,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4671800.0, ans=0.0 2024-08-20 05:10:25,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4672000.0, ans=0.125 2024-08-20 05:10:30,328 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-20 05:10:42,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4672100.0, ans=0.0 2024-08-20 05:11:10,448 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 10 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 05:11:10,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2024-08-20 05:11:26,335 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7850, loss[loss=0.1115, beats_loss=0.009747, ecapa_loss=0.0001496, whisper_loss=0.1003, over 20646.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001403, whisper_loss=0.08995, over 3829506.48 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:11:41,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4672300.0, ans=0.02 2024-08-20 05:11:46,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.521e+01 2.830e+01 3.600e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-20 05:11:50,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4672400.0, ans=0.2 2024-08-20 05:12:08,531 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2024-08-20 05:12:18,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4672500.0, ans=0.0 2024-08-20 05:12:29,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4672500.0, ans=0.125 2024-08-20 05:13:09,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4672700.0, ans=0.1 2024-08-20 05:13:13,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4672800.0, ans=0.125 2024-08-20 05:13:14,788 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7900, loss[loss=0.1081, beats_loss=0.008729, ecapa_loss=0.0001738, whisper_loss=0.09758, over 14785.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.08945, over 3804540.07 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:13:23,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4672800.0, ans=0.0 2024-08-20 05:13:32,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4672800.0, ans=0.05 2024-08-20 05:14:01,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4673000.0, ans=0.0 2024-08-20 05:14:05,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4673000.0, ans=0.125 2024-08-20 05:14:28,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4673100.0, ans=0.0 2024-08-20 05:14:47,268 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 05:14:52,376 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 05:14:57,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4673200.0, ans=0.0 2024-08-20 05:15:02,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4673200.0, ans=0.125 2024-08-20 05:15:08,732 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 7950, loss[loss=0.1359, beats_loss=0.008235, ecapa_loss=0.0001433, whisper_loss=0.1263, over 17264.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.09013, over 3835765.28 frames. ], batch size: 65, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:15:28,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.672e+01 2.282e+01 2.544e+01 2.823e+01 6.203e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-20 05:15:49,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4673400.0, ans=0.1 2024-08-20 05:16:01,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2024-08-20 05:16:50,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4673700.0, ans=0.125 2024-08-20 05:16:57,533 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8000, loss[loss=0.09257, beats_loss=0.01308, ecapa_loss=0.0001431, whisper_loss=0.07806, over 14495.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001409, whisper_loss=0.09038, over 3790971.75 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:17:28,030 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-08-20 05:17:31,913 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 05:17:36,009 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 05:17:38,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4674000.0, ans=0.125 2024-08-20 05:18:22,373 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 05:18:41,447 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8050, loss[loss=0.109, beats_loss=0.009787, ecapa_loss=0.0001428, whisper_loss=0.09779, over 20020.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.09058, over 3806122.28 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:18:41,697 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 05:18:59,651 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.351e+01 2.548e+01 2.857e+01 8.304e+01, threshold=5.095e+01, percent-clipped=2.0 2024-08-20 05:19:05,279 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 05:19:09,446 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 05:19:12,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.69 vs. limit=22.5 2024-08-20 05:19:34,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-20 05:19:36,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-08-20 05:19:53,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4674600.0, ans=0.125 2024-08-20 05:19:55,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4674600.0, ans=0.5 2024-08-20 05:20:23,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-20 05:20:32,106 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8100, loss[loss=0.09875, beats_loss=0.01224, ecapa_loss=0.000118, whisper_loss=0.08533, over 23341.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001406, whisper_loss=0.09064, over 3827689.85 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:20:41,453 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 05:20:41,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4674800.0, ans=0.125 2024-08-20 05:20:50,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4674800.0, ans=0.125 2024-08-20 05:21:46,550 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 05:22:05,215 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 05:22:08,363 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 05:22:23,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.09 vs. limit=22.5 2024-08-20 05:22:25,230 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8150, loss[loss=0.09393, beats_loss=0.01149, ecapa_loss=0.0001453, whisper_loss=0.08098, over 13499.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001389, whisper_loss=0.09102, over 3831164.18 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:22:47,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.184e+01 2.427e+01 2.667e+01 4.030e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 05:23:11,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-08-20 05:23:13,305 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:23:15,153 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-20 05:23:17,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4675500.0, ans=0.125 2024-08-20 05:23:31,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4675600.0, ans=0.125 2024-08-20 05:23:40,473 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 05:23:45,073 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 05:23:45,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4675600.0, ans=0.0 2024-08-20 05:23:54,381 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 05:23:59,437 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 05:24:01,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4675700.0, ans=0.0 2024-08-20 05:24:02,368 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-20 05:24:06,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-20 05:24:14,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4675700.0, ans=0.0 2024-08-20 05:24:21,741 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8200, loss[loss=0.1023, beats_loss=0.008473, ecapa_loss=0.0001549, whisper_loss=0.09232, over 17392.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.000138, whisper_loss=0.09009, over 3839551.66 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:24:27,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4675800.0, ans=0.125 2024-08-20 05:24:43,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-08-20 05:24:51,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4675900.0, ans=0.2 2024-08-20 05:24:55,838 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 05:25:04,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.20 vs. limit=22.5 2024-08-20 05:25:17,630 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 15 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 05:25:25,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4676000.0, ans=0.2 2024-08-20 05:25:32,134 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 05:25:34,275 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03858109936118126, model_norm_threshold=48.536659240722656 2024-08-20 05:25:34,430 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.171e+05, grad_sumsq=2.171e+05, orig_rms_sq=1.000e+00 2024-08-20 05:26:11,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4676200.0, ans=0.0 2024-08-20 05:26:15,885 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8250, loss[loss=0.1108, beats_loss=0.009889, ecapa_loss=0.0001516, whisper_loss=0.09943, over 17342.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001387, whisper_loss=0.09012, over 3850755.03 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:26:36,937 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.406e+01 2.629e+01 3.126e+01 1.258e+03, threshold=5.257e+01, percent-clipped=4.0 2024-08-20 05:26:55,389 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 05:27:22,710 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2024-08-20 05:27:30,684 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 05:27:40,202 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=12.0 2024-08-20 05:27:48,930 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 05:28:06,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4676700.0, ans=0.2 2024-08-20 05:28:09,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4676700.0, ans=0.1 2024-08-20 05:28:15,369 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8300, loss[loss=0.06994, beats_loss=0.01165, ecapa_loss=0.0001101, whisper_loss=0.05719, over 15755.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001386, whisper_loss=0.08961, over 3849542.71 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:28:25,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4676800.0, ans=0.0 2024-08-20 05:28:31,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4676800.0, ans=0.0 2024-08-20 05:28:48,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4676900.0, ans=0.0 2024-08-20 05:28:56,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4676900.0, ans=0.125 2024-08-20 05:29:12,798 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 8 from Vox, 37 fro AS 2024-08-20 05:29:17,339 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 05:29:29,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4677100.0, ans=0.125 2024-08-20 05:29:34,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4677100.0, ans=0.125 2024-08-20 05:29:39,428 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=12.0 2024-08-20 05:30:00,478 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 26 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 05:30:08,649 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8350, loss[loss=0.1061, beats_loss=0.01006, ecapa_loss=0.0001251, whisper_loss=0.09475, over 22588.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.0902, over 3856659.95 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:30:26,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.380e+01 2.610e+01 3.013e+01 5.449e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-20 05:30:28,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-20 05:30:29,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4677400.0, ans=0.07 2024-08-20 05:30:57,435 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-20 05:31:03,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4677500.0, ans=0.125 2024-08-20 05:31:05,907 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2024-08-20 05:31:13,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4677600.0, ans=0.0 2024-08-20 05:31:22,659 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-20 05:31:43,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4677700.0, ans=0.0 2024-08-20 05:31:48,900 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8400, loss[loss=0.1009, beats_loss=0.008982, ecapa_loss=0.0001453, whisper_loss=0.09048, over 22470.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001405, whisper_loss=0.09079, over 3868388.84 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:31:55,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4677800.0, ans=0.2 2024-08-20 05:32:01,574 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 05:32:10,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4677900.0, ans=0.1 2024-08-20 05:32:29,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4677900.0, ans=0.0 2024-08-20 05:32:36,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4678000.0, ans=0.2 2024-08-20 05:32:51,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4678000.0, ans=0.1 2024-08-20 05:32:55,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4678100.0, ans=0.125 2024-08-20 05:32:55,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4678100.0, ans=0.0 2024-08-20 05:33:44,243 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8450, loss[loss=0.06361, beats_loss=0.01322, ecapa_loss=0.0001208, whisper_loss=0.04919, over 15978.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001408, whisper_loss=0.09059, over 3825817.00 frames. ], batch size: 67, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:34:03,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.232e+01 2.452e+01 2.661e+01 1.500e+02, threshold=4.905e+01, percent-clipped=2.0 2024-08-20 05:34:19,511 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 05:34:42,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4678500.0, ans=0.125 2024-08-20 05:34:45,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4678500.0, ans=0.07 2024-08-20 05:35:06,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2024-08-20 05:35:15,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4678700.0, ans=0.125 2024-08-20 05:35:26,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4678700.0, ans=0.125 2024-08-20 05:35:35,742 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8500, loss[loss=0.08223, beats_loss=0.01012, ecapa_loss=0.0001629, whisper_loss=0.07047, over 17222.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01026, ecapa_loss=0.0001418, whisper_loss=0.09075, over 3824272.61 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:35:46,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4678800.0, ans=0.0 2024-08-20 05:35:50,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4678800.0, ans=15.0 2024-08-20 05:35:52,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4678800.0, ans=0.1 2024-08-20 05:36:30,848 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 05:36:45,209 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4679100.0, ans=0.0 2024-08-20 05:36:51,905 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 05:36:52,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4679100.0, ans=0.0 2024-08-20 05:37:01,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4679100.0, ans=0.0 2024-08-20 05:37:30,691 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8550, loss[loss=0.1122, beats_loss=0.007897, ecapa_loss=0.0001678, whisper_loss=0.1027, over 15968.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.0001429, whisper_loss=0.09077, over 3835111.29 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:37:46,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4679300.0, ans=0.125 2024-08-20 05:37:46,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4679300.0, ans=0.2 2024-08-20 05:37:50,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.232e+01 2.501e+01 2.726e+01 3.621e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-20 05:37:51,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4679400.0, ans=0.0 2024-08-20 05:37:53,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4679400.0, ans=0.0 2024-08-20 05:37:56,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4679400.0, ans=0.2 2024-08-20 05:38:03,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4679400.0, ans=0.09899494936611666 2024-08-20 05:38:44,080 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 05:39:07,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-20 05:39:24,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4679700.0, ans=0.125 2024-08-20 05:39:27,859 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8600, loss[loss=0.1285, beats_loss=0.008497, ecapa_loss=0.0001248, whisper_loss=0.1188, over 23768.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01024, ecapa_loss=0.0001411, whisper_loss=0.09083, over 3814261.48 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:39:35,598 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 05:40:00,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-20 05:40:02,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4679900.0, ans=0.125 2024-08-20 05:40:26,980 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-20 05:40:42,850 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 05:40:43,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4680100.0, ans=0.125 2024-08-20 05:40:50,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4680100.0, ans=0.2 2024-08-20 05:40:58,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4680200.0, ans=0.125 2024-08-20 05:41:00,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-20 05:41:11,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4680200.0, ans=0.125 2024-08-20 05:41:17,510 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8650, loss[loss=0.09186, beats_loss=0.009822, ecapa_loss=0.0001751, whisper_loss=0.08029, over 15408.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0102, ecapa_loss=0.0001408, whisper_loss=0.0916, over 3825412.89 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:41:18,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=12.0 2024-08-20 05:41:31,666 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-20 05:41:39,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.245e+01 2.496e+01 2.765e+01 3.926e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-20 05:41:48,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4680400.0, ans=0.07 2024-08-20 05:41:48,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4680400.0, ans=0.125 2024-08-20 05:42:05,509 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 16 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 05:42:21,584 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 05:42:33,366 INFO [train_multi_KD3.py:845] (1/4) A total of 97 cuts. 29 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 05:42:42,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4680600.0, ans=0.0 2024-08-20 05:42:47,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4680600.0, ans=0.125 2024-08-20 05:43:03,927 WARNING [optim.py:496] (1/4) Scaling gradients by 0.040249649435281754, model_norm_threshold=49.920475006103516 2024-08-20 05:43:04,084 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.371e+05, grad_sumsq=4.172e+04, orig_rms_sq=3.286e+00 2024-08-20 05:43:15,059 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8700, loss[loss=0.09156, beats_loss=0.01206, ecapa_loss=0.0001544, whisper_loss=0.07796, over 18602.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01023, ecapa_loss=0.0001412, whisper_loss=0.09096, over 3817717.54 frames. ], batch size: 77, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:43:15,302 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 05:43:20,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4680800.0, ans=0.95 2024-08-20 05:43:25,798 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 05:43:57,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4681000.0, ans=0.2 2024-08-20 05:44:07,332 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.996e+00 2024-08-20 05:44:21,273 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 05:44:28,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4681100.0, ans=0.1 2024-08-20 05:44:55,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-20 05:44:58,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4681200.0, ans=0.125 2024-08-20 05:45:10,986 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8750, loss[loss=0.1138, beats_loss=0.006057, ecapa_loss=0.0001746, whisper_loss=0.106, over 15428.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01024, ecapa_loss=0.0001405, whisper_loss=0.09165, over 3801835.29 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:45:30,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4681300.0, ans=0.125 2024-08-20 05:45:32,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.309e+01 2.539e+01 2.876e+01 1.240e+03, threshold=5.077e+01, percent-clipped=3.0 2024-08-20 05:45:46,839 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 05:46:01,302 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 05:46:13,034 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 05:46:23,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4681600.0, ans=15.0 2024-08-20 05:46:27,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4681600.0, ans=0.1 2024-08-20 05:46:45,091 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4681700.0, ans=0.125 2024-08-20 05:46:48,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4681700.0, ans=0.0 2024-08-20 05:47:03,870 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8800, loss[loss=0.1271, beats_loss=0.006422, ecapa_loss=0.0001412, whisper_loss=0.1192, over 18640.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0102, ecapa_loss=0.0001407, whisper_loss=0.09176, over 3789850.69 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:47:25,316 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0940530002117157, model_norm_threshold=50.77210235595703 2024-08-20 05:47:25,474 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.105e+04, grad_sumsq=6.105e+04, orig_rms_sq=1.000e+00 2024-08-20 05:47:39,632 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 05:47:43,954 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 20 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 05:48:07,359 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 05:48:13,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4682100.0, ans=0.2 2024-08-20 05:48:17,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4682100.0, ans=0.04949747468305833 2024-08-20 05:48:26,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4682200.0, ans=0.0 2024-08-20 05:48:34,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4682200.0, ans=0.125 2024-08-20 05:48:41,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4682300.0, ans=0.0 2024-08-20 05:48:42,396 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8850, loss[loss=0.1043, beats_loss=0.01016, ecapa_loss=0.0001207, whisper_loss=0.09298, over 22937.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001406, whisper_loss=0.09115, over 3774434.18 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:48:45,027 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 12 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 05:48:57,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4682300.0, ans=0.5 2024-08-20 05:48:58,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.309e+01 2.540e+01 2.877e+01 5.398e+02, threshold=5.080e+01, percent-clipped=3.0 2024-08-20 05:49:11,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4682400.0, ans=0.1 2024-08-20 05:49:20,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4682500.0, ans=0.125 2024-08-20 05:49:28,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-08-20 05:49:29,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4682500.0, ans=0.0 2024-08-20 05:49:29,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=4682500.0, ans=12.0 2024-08-20 05:49:35,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4682500.0, ans=0.05 2024-08-20 05:49:44,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4682600.0, ans=0.0 2024-08-20 05:50:13,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4682700.0, ans=0.125 2024-08-20 05:50:18,902 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8900, loss[loss=0.1064, beats_loss=0.009515, ecapa_loss=0.0001334, whisper_loss=0.09559, over 22406.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.09103, over 3802877.63 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:50:28,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4682800.0, ans=0.05 2024-08-20 05:50:42,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.49 vs. limit=22.5 2024-08-20 05:50:48,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-08-20 05:51:07,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4683000.0, ans=0.2 2024-08-20 05:51:09,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4683000.0, ans=15.0 2024-08-20 05:51:44,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4683200.0, ans=0.0 2024-08-20 05:51:48,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4683200.0, ans=0.125 2024-08-20 05:51:51,886 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 05:51:55,964 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 8950, loss[loss=0.08403, beats_loss=0.01386, ecapa_loss=0.0001525, whisper_loss=0.06865, over 22084.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001411, whisper_loss=0.09037, over 3835419.84 frames. ], batch size: 94, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:51:57,723 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 05:52:04,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4683300.0, ans=0.125 2024-08-20 05:52:10,074 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 05:52:12,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.265e+01 2.516e+01 2.733e+01 4.609e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 05:52:14,230 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 05:52:24,982 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-20 05:52:33,307 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 05:52:57,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-20 05:53:00,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4683600.0, ans=0.0 2024-08-20 05:53:12,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4683700.0, ans=0.04949747468305833 2024-08-20 05:53:16,801 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 05:53:17,633 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2024-08-20 05:53:25,538 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9000, loss[loss=0.09582, beats_loss=0.0112, ecapa_loss=0.0001519, whisper_loss=0.08311, over 18929.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001414, whisper_loss=0.09007, over 3818557.86 frames. ], batch size: 76, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:53:25,539 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 05:54:02,497 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 05:54:24,045 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on SV_voxceleb1: loss=0.004011, beats_loss=0, ecapa_loss=0.0004011, whisper_loss=0, over 944235.00 frames. 2024-08-20 05:56:01,269 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on AT_audioset: loss=0.02303, beats_loss=0.02303, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 05:56:01,273 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 05:56:09,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4683800.0, ans=0.0 2024-08-20 05:56:13,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4683800.0, ans=0.125 2024-08-20 05:56:16,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4683900.0, ans=0.0 2024-08-20 05:56:43,167 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 05:56:49,972 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 05:57:01,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4684100.0, ans=0.125 2024-08-20 05:57:01,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-20 05:57:02,885 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=4684100.0, ans=0.02 2024-08-20 05:57:09,354 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 05:57:20,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4684200.0, ans=0.125 2024-08-20 05:57:20,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4684200.0, ans=0.1 2024-08-20 05:57:24,001 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9050, loss[loss=0.1206, beats_loss=0.009201, ecapa_loss=0.0001226, whisper_loss=0.1102, over 16109.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.000141, whisper_loss=0.09018, over 3836301.57 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:57:37,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2024-08-20 05:57:38,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.206e+01 2.470e+01 2.742e+01 4.296e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 05:57:42,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4684400.0, ans=0.125 2024-08-20 05:57:44,140 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-20 05:58:16,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4684600.0, ans=0.1 2024-08-20 05:58:25,110 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 05:58:45,890 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9100, loss[loss=0.08725, beats_loss=0.01005, ecapa_loss=0.000162, whisper_loss=0.07558, over 22677.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.09014, over 3822640.40 frames. ], batch size: 95, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:58:46,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4684800.0, ans=0.0 2024-08-20 05:59:05,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4684900.0, ans=0.125 2024-08-20 05:59:08,756 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 06:00:09,133 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 26 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 06:00:09,943 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-08-20 06:00:10,209 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9150, loss[loss=0.1136, beats_loss=0.01105, ecapa_loss=0.000128, whisper_loss=0.1012, over 17188.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.000141, whisper_loss=0.08999, over 3831157.37 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:00:17,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4685300.0, ans=0.125 2024-08-20 06:00:25,821 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 06:00:27,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.310e+01 2.550e+01 2.853e+01 1.227e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-20 06:00:40,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-08-20 06:00:41,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4685400.0, ans=0.125 2024-08-20 06:00:46,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4685500.0, ans=0.125 2024-08-20 06:01:06,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4685600.0, ans=0.125 2024-08-20 06:01:33,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-20 06:01:35,568 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9200, loss[loss=0.1282, beats_loss=0.009083, ecapa_loss=0.0001366, whisper_loss=0.1177, over 23994.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001421, whisper_loss=0.0909, over 3846066.91 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:01:39,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4685800.0, ans=0.125 2024-08-20 06:02:01,500 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 06:02:01,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4685900.0, ans=0.125 2024-08-20 06:02:29,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4686100.0, ans=0.2 2024-08-20 06:03:02,852 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9250, loss[loss=0.07954, beats_loss=0.01188, ecapa_loss=0.0001324, whisper_loss=0.06633, over 12349.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.09058, over 3837870.09 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:03:03,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4686300.0, ans=0.125 2024-08-20 06:03:12,214 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 06:03:15,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.83 vs. limit=10.0 2024-08-20 06:03:16,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4686300.0, ans=0.125 2024-08-20 06:03:20,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.318e+01 2.500e+01 2.733e+01 3.571e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-20 06:03:24,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4686400.0, ans=0.0 2024-08-20 06:03:42,750 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-20 06:03:44,389 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 06:03:53,095 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.62 vs. limit=6.0 2024-08-20 06:04:11,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-20 06:04:24,951 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 06:04:25,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4686700.0, ans=0.1 2024-08-20 06:04:27,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4686700.0, ans=0.2 2024-08-20 06:04:31,234 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9300, loss[loss=0.07749, beats_loss=0.01093, ecapa_loss=0.0001754, whisper_loss=0.06481, over 11768.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.0001399, whisper_loss=0.09155, over 3843354.96 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:04:56,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4686900.0, ans=0.0 2024-08-20 06:06:04,213 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 06:06:08,689 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9350, loss[loss=0.1098, beats_loss=0.008476, ecapa_loss=0.0001485, whisper_loss=0.0998, over 16377.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.09139, over 3812602.23 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:06:28,518 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.303e+01 2.586e+01 2.791e+01 3.756e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-20 06:06:29,381 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.87 vs. limit=12.0 2024-08-20 06:06:38,374 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-20 06:06:40,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4687400.0, ans=0.125 2024-08-20 06:06:42,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=4687400.0, ans=15.0 2024-08-20 06:06:50,891 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 06:07:28,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4687700.0, ans=0.125 2024-08-20 06:07:37,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4687700.0, ans=0.95 2024-08-20 06:07:39,645 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9400, loss[loss=0.1046, beats_loss=0.009882, ecapa_loss=0.000172, whisper_loss=0.09299, over 20716.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.09112, over 3837453.69 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:07:56,909 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 31 from LS+wenet, 12 from Vox, 49 fro AS 2024-08-20 06:08:00,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4687900.0, ans=0.125 2024-08-20 06:08:08,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4687900.0, ans=0.04949747468305833 2024-08-20 06:08:15,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2024-08-20 06:08:15,746 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 06:08:32,046 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 29 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 06:08:49,804 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 06:08:55,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4688200.0, ans=0.125 2024-08-20 06:08:56,721 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 06:09:07,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4688300.0, ans=0.0 2024-08-20 06:09:08,340 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9450, loss[loss=0.08939, beats_loss=0.01135, ecapa_loss=0.0001268, whisper_loss=0.07677, over 21833.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001403, whisper_loss=0.0909, over 3892809.44 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:09:17,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4688300.0, ans=0.1 2024-08-20 06:09:27,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.376e+01 2.594e+01 2.934e+01 1.922e+02, threshold=5.189e+01, percent-clipped=1.0 2024-08-20 06:09:39,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4688400.0, ans=0.0 2024-08-20 06:10:36,260 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9500, loss[loss=0.1088, beats_loss=0.009183, ecapa_loss=0.0001531, whisper_loss=0.09808, over 22478.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.09095, over 3876637.78 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:10:45,533 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 06:10:48,095 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-20 06:11:26,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4689100.0, ans=0.1 2024-08-20 06:11:46,578 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 06:11:53,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4689200.0, ans=0.125 2024-08-20 06:12:03,404 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9550, loss[loss=0.0959, beats_loss=0.01047, ecapa_loss=0.0001521, whisper_loss=0.08391, over 20129.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001403, whisper_loss=0.09089, over 3876859.01 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:12:21,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.271e+01 2.487e+01 2.797e+01 1.341e+02, threshold=4.974e+01, percent-clipped=1.0 2024-08-20 06:12:21,705 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 06:12:49,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4689500.0, ans=0.0 2024-08-20 06:13:26,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-08-20 06:13:31,036 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 06:13:32,315 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9600, loss[loss=0.09305, beats_loss=0.01242, ecapa_loss=0.0001198, whisper_loss=0.07944, over 17407.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001409, whisper_loss=0.0907, over 3802255.21 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:13:47,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4689900.0, ans=0.0 2024-08-20 06:13:51,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4689900.0, ans=0.125 2024-08-20 06:13:56,354 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 30 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 06:14:04,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4689900.0, ans=0.125 2024-08-20 06:14:06,872 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2024-08-20 06:14:16,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2024-08-20 06:14:43,377 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 06:14:43,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4690100.0, ans=10.0 2024-08-20 06:14:46,762 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 06:15:06,759 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9650, loss[loss=0.1057, beats_loss=0.009365, ecapa_loss=0.0001446, whisper_loss=0.0949, over 21648.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.091, over 3823563.62 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:15:10,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4690300.0, ans=0.2 2024-08-20 06:15:16,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2024-08-20 06:15:26,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.779e+01 3.042e+01 4.169e+01, threshold=5.558e+01, percent-clipped=0.0 2024-08-20 06:15:30,393 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 06:15:40,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4690400.0, ans=0.0 2024-08-20 06:16:01,409 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 06:16:32,794 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9700, loss[loss=0.1078, beats_loss=0.01202, ecapa_loss=0.0001115, whisper_loss=0.09462, over 23917.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001405, whisper_loss=0.09001, over 3807240.20 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:16:36,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.78 vs. limit=6.0 2024-08-20 06:16:37,660 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 06:16:39,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4690800.0, ans=0.125 2024-08-20 06:16:44,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4690800.0, ans=0.125 2024-08-20 06:16:56,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4690900.0, ans=0.125 2024-08-20 06:17:18,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4691000.0, ans=0.125 2024-08-20 06:17:29,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-20 06:17:34,620 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 17 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 06:17:34,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4691100.0, ans=0.0 2024-08-20 06:17:54,435 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9750, loss[loss=0.09434, beats_loss=0.01163, ecapa_loss=0.0001767, whisper_loss=0.08094, over 19336.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001402, whisper_loss=0.08957, over 3776424.71 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:18:04,791 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 06:18:11,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-20 06:18:12,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.245e+01 2.617e+01 2.841e+01 5.114e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-20 06:18:24,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4691400.0, ans=0.0 2024-08-20 06:18:43,559 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 06:19:07,159 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:19:08,601 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 06:19:11,897 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 06:19:16,750 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9800, loss[loss=0.1217, beats_loss=0.00858, ecapa_loss=0.0001587, whisper_loss=0.1115, over 22121.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.00014, whisper_loss=0.08988, over 3799670.49 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:19:22,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4691800.0, ans=0.0 2024-08-20 06:19:25,618 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 26 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-20 06:19:46,628 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 06:19:58,454 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 06:20:04,968 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 06:20:08,624 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 06:20:10,300 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 06:20:27,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2024-08-20 06:20:30,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4692200.0, ans=0.125 2024-08-20 06:20:30,968 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-08-20 06:20:35,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4692200.0, ans=0.0 2024-08-20 06:20:39,085 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-20 06:20:39,623 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9850, loss[loss=0.09442, beats_loss=0.01256, ecapa_loss=0.0001288, whisper_loss=0.08056, over 16326.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.08979, over 3822252.77 frames. ], batch size: 65, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:20:49,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4692300.0, ans=0.0 2024-08-20 06:20:58,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.366e+01 2.568e+01 2.856e+01 6.259e+01, threshold=5.136e+01, percent-clipped=2.0 2024-08-20 06:21:00,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4692400.0, ans=0.1 2024-08-20 06:21:46,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4692700.0, ans=0.0 2024-08-20 06:22:01,824 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 06:22:02,849 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9900, loss[loss=0.09054, beats_loss=0.01195, ecapa_loss=0.0001306, whisper_loss=0.07728, over 21479.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001394, whisper_loss=0.08955, over 3845796.86 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:22:05,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4692800.0, ans=6.0 2024-08-20 06:22:06,980 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 06:22:07,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4692800.0, ans=0.125 2024-08-20 06:22:20,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4692900.0, ans=0.0 2024-08-20 06:22:24,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4692900.0, ans=0.0 2024-08-20 06:22:34,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4693000.0, ans=0.07 2024-08-20 06:23:08,734 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 06:23:08,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4693200.0, ans=0.0 2024-08-20 06:23:12,037 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 06:23:17,140 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 06:23:24,910 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 9950, loss[loss=0.09799, beats_loss=0.01072, ecapa_loss=0.0001086, whisper_loss=0.08618, over 16378.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001382, whisper_loss=0.0891, over 3804465.93 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:23:36,787 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 06:23:38,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4693300.0, ans=0.1 2024-08-20 06:23:42,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.243e+01 2.443e+01 2.710e+01 3.765e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-20 06:23:50,039 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:23:56,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4693500.0, ans=0.1 2024-08-20 06:24:04,365 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 06:24:11,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4693500.0, ans=0.125 2024-08-20 06:24:16,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4693600.0, ans=0.2 2024-08-20 06:24:21,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4693600.0, ans=0.1 2024-08-20 06:24:35,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4693700.0, ans=0.125 2024-08-20 06:24:37,126 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-20 06:24:38,743 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 06:24:46,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4693700.0, ans=0.125 2024-08-20 06:24:49,106 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10000, loss[loss=0.1216, beats_loss=0.009064, ecapa_loss=0.0001302, whisper_loss=0.1112, over 21520.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001387, whisper_loss=0.08938, over 3784880.20 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:24:54,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4693800.0, ans=0.04949747468305833 2024-08-20 06:25:01,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4693800.0, ans=0.1 2024-08-20 06:25:58,889 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 06:26:07,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4694200.0, ans=0.125 2024-08-20 06:26:19,063 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10050, loss[loss=0.09555, beats_loss=0.01236, ecapa_loss=0.0001131, whisper_loss=0.08206, over 22629.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001395, whisper_loss=0.08959, over 3807437.21 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:26:22,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4694300.0, ans=0.125 2024-08-20 06:26:33,823 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 24 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-20 06:26:37,061 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.304e+01 2.607e+01 2.920e+01 4.346e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-20 06:26:39,538 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 14 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 06:26:46,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4694400.0, ans=0.2 2024-08-20 06:27:22,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2024-08-20 06:27:47,732 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10100, loss[loss=0.1071, beats_loss=0.007622, ecapa_loss=0.0001682, whisper_loss=0.09777, over 14378.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001401, whisper_loss=0.09001, over 3807452.86 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:28:32,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4695000.0, ans=0.0 2024-08-20 06:28:57,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4695100.0, ans=0.125 2024-08-20 06:29:01,606 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 35 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 06:29:09,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4695200.0, ans=0.125 2024-08-20 06:29:15,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4695200.0, ans=0.125 2024-08-20 06:29:22,549 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10150, loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001717, whisper_loss=0.09088, over 15708.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001396, whisper_loss=0.08995, over 3828540.82 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:29:44,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.187e+01 2.409e+01 2.808e+01 3.836e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-20 06:29:47,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-20 06:29:51,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=4695400.0, ans=0.1 2024-08-20 06:30:07,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4695500.0, ans=0.125 2024-08-20 06:30:10,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4695500.0, ans=0.1 2024-08-20 06:30:28,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-20 06:30:42,407 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4695700.0, ans=0.0 2024-08-20 06:30:58,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4695700.0, ans=0.125 2024-08-20 06:31:02,373 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10200, loss[loss=0.1012, beats_loss=0.01175, ecapa_loss=0.0001461, whisper_loss=0.08796, over 22245.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001398, whisper_loss=0.0903, over 3811154.49 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:31:47,241 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 19 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 06:31:51,171 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 39 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 06:32:06,918 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 06:32:25,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4696200.0, ans=0.1 2024-08-20 06:32:30,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4696200.0, ans=0.0 2024-08-20 06:32:31,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-20 06:32:39,477 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10250, loss[loss=0.0909, beats_loss=0.01232, ecapa_loss=0.0001208, whisper_loss=0.07737, over 22450.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001402, whisper_loss=0.09023, over 3811258.06 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:32:44,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4696300.0, ans=0.0 2024-08-20 06:32:55,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4696300.0, ans=0.125 2024-08-20 06:32:55,746 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07058558613061905, model_norm_threshold=48.17802047729492 2024-08-20 06:32:55,905 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.542e+04, grad_sumsq=7.542e+04, orig_rms_sq=1.000e+00 2024-08-20 06:33:01,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.230e+01 2.493e+01 2.759e+01 6.825e+02, threshold=4.986e+01, percent-clipped=2.0 2024-08-20 06:33:15,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4696400.0, ans=0.05 2024-08-20 06:33:21,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4696500.0, ans=0.0 2024-08-20 06:33:27,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4696500.0, ans=0.125 2024-08-20 06:33:28,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4696500.0, ans=0.0 2024-08-20 06:33:44,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4696600.0, ans=0.125 2024-08-20 06:33:54,522 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 06:34:02,428 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07055973261594772, model_norm_threshold=49.85801315307617 2024-08-20 06:34:02,583 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.714e+04, grad_sumsq=4.714e+04, orig_rms_sq=1.000e+00 2024-08-20 06:34:21,202 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 34 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 06:34:22,326 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10300, loss[loss=0.1397, beats_loss=0.008392, ecapa_loss=0.0001368, whisper_loss=0.1299, over 19515.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001415, whisper_loss=0.09091, over 3809361.80 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:34:33,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4696800.0, ans=0.125 2024-08-20 06:34:33,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4696800.0, ans=0.025 2024-08-20 06:34:44,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4696900.0, ans=0.125 2024-08-20 06:35:26,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4697100.0, ans=0.125 2024-08-20 06:35:28,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-08-20 06:35:36,163 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 06:35:46,974 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.251e-01 2024-08-20 06:35:51,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4697200.0, ans=0.0 2024-08-20 06:36:03,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-20 06:36:04,311 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10350, loss[loss=0.09381, beats_loss=0.009173, ecapa_loss=0.0001489, whisper_loss=0.08315, over 23626.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.09015, over 3837403.53 frames. ], batch size: 96, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:36:10,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4697300.0, ans=0.0 2024-08-20 06:36:14,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4697300.0, ans=0.125 2024-08-20 06:36:18,418 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 06:36:27,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.261e+01 2.488e+01 2.810e+01 7.066e+02, threshold=4.977e+01, percent-clipped=3.0 2024-08-20 06:36:42,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4697400.0, ans=10.0 2024-08-20 06:36:44,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4697500.0, ans=0.125 2024-08-20 06:36:49,245 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-20 06:37:32,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4697700.0, ans=0.125 2024-08-20 06:37:36,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4697700.0, ans=0.125 2024-08-20 06:37:45,826 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 06:37:47,029 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10400, loss[loss=0.09794, beats_loss=0.01209, ecapa_loss=0.0001264, whisper_loss=0.08458, over 22367.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01057, ecapa_loss=0.0001396, whisper_loss=0.08926, over 3833451.62 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:38:10,421 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-20 06:38:45,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-08-20 06:38:46,642 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 06:38:48,981 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 06:39:00,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4698100.0, ans=0.1 2024-08-20 06:39:05,783 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 06:39:07,458 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 06:39:31,458 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10450, loss[loss=0.126, beats_loss=0.007795, ecapa_loss=0.0001346, whisper_loss=0.1168, over 18601.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.08978, over 3854832.20 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:39:36,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4698300.0, ans=0.0 2024-08-20 06:39:44,616 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 27 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 06:39:53,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.263e+01 2.457e+01 2.777e+01 9.468e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-20 06:40:07,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-08-20 06:40:17,793 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 06:40:20,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4698500.0, ans=0.2 2024-08-20 06:40:31,183 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 18 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 06:40:31,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4698600.0, ans=0.125 2024-08-20 06:40:38,125 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-20 06:40:55,113 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:41:00,396 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 13 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 06:41:11,109 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10500, loss[loss=0.1171, beats_loss=0.009051, ecapa_loss=0.0001518, whisper_loss=0.1065, over 15290.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001401, whisper_loss=0.08999, over 3859553.52 frames. ], batch size: 58, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:41:12,154 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 40 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 06:41:18,847 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 13 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 06:41:41,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4698900.0, ans=0.125 2024-08-20 06:41:51,833 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-20 06:42:10,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4699000.0, ans=0.0 2024-08-20 06:42:36,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4699200.0, ans=0.0 2024-08-20 06:42:38,050 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 06:42:40,146 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 06:42:46,352 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-20 06:42:53,602 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-20 06:42:54,630 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10550, loss[loss=0.1046, beats_loss=0.01063, ecapa_loss=0.000114, whisper_loss=0.09278, over 23981.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001401, whisper_loss=0.08938, over 3870628.51 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:42:59,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4699300.0, ans=0.2 2024-08-20 06:43:17,158 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-20 06:43:17,677 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.308e+01 2.564e+01 2.826e+01 3.881e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-20 06:43:20,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4699400.0, ans=15.0 2024-08-20 06:43:24,905 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 06:44:02,821 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 14 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 06:44:11,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.04 vs. limit=15.0 2024-08-20 06:44:24,437 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 31 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 06:44:26,584 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-20 06:44:34,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4699700.0, ans=0.125 2024-08-20 06:44:38,382 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10600, loss[loss=0.1025, beats_loss=0.00884, ecapa_loss=0.0001491, whisper_loss=0.09219, over 19334.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001392, whisper_loss=0.08902, over 3834705.95 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:44:49,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4699800.0, ans=0.125 2024-08-20 06:44:58,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4699900.0, ans=0.09899494936611666 2024-08-20 06:45:17,523 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=15.0 2024-08-20 06:45:28,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4700000.0, ans=0.0 2024-08-20 06:45:46,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4700100.0, ans=6.0 2024-08-20 06:45:48,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4700100.0, ans=0.0 2024-08-20 06:45:56,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4700100.0, ans=0.125 2024-08-20 06:46:04,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4700200.0, ans=0.0 2024-08-20 06:46:21,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4700200.0, ans=0.125 2024-08-20 06:46:24,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4700300.0, ans=0.0 2024-08-20 06:46:24,870 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10650, loss[loss=0.08916, beats_loss=0.01123, ecapa_loss=0.0001328, whisper_loss=0.0766, over 20062.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.0001388, whisper_loss=0.0888, over 3805704.63 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:46:31,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4700300.0, ans=0.125 2024-08-20 06:46:40,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4700300.0, ans=0.0 2024-08-20 06:46:46,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.297e+01 2.515e+01 2.879e+01 5.897e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-20 06:47:53,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=4700700.0, ans=0.02 2024-08-20 06:47:57,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4700700.0, ans=0.125 2024-08-20 06:48:00,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4700700.0, ans=0.125 2024-08-20 06:48:01,467 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 06:48:03,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4700700.0, ans=0.0 2024-08-20 06:48:09,992 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10700, loss[loss=0.1142, beats_loss=0.009013, ecapa_loss=0.0001004, whisper_loss=0.1042, over 15857.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.08891, over 3795535.80 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:48:11,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4700800.0, ans=0.1 2024-08-20 06:48:25,080 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4700800.0, ans=0.125 2024-08-20 06:48:30,024 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 06:48:35,664 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 06:48:47,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4701000.0, ans=0.0 2024-08-20 06:48:47,708 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4701000.0, ans=0.1 2024-08-20 06:48:58,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4701000.0, ans=0.0 2024-08-20 06:49:15,451 WARNING [optim.py:496] (1/4) Scaling gradients by 0.023652782663702965, model_norm_threshold=50.29466247558594 2024-08-20 06:49:15,645 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.689e+06, grad_sumsq=1.581e+08, orig_rms_sq=1.068e-02 2024-08-20 06:49:20,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4701100.0, ans=0.07 2024-08-20 06:49:42,385 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 06:49:50,255 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10750, loss[loss=0.09206, beats_loss=0.01171, ecapa_loss=9.82e-05, whisper_loss=0.07938, over 14725.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01057, ecapa_loss=0.0001385, whisper_loss=0.08818, over 3768206.51 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:49:53,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=22.5 2024-08-20 06:49:58,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4701300.0, ans=0.07 2024-08-20 06:50:08,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4701400.0, ans=0.125 2024-08-20 06:50:10,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4701400.0, ans=0.125 2024-08-20 06:50:11,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.335e+01 2.524e+01 2.835e+01 2.126e+03, threshold=5.048e+01, percent-clipped=3.0 2024-08-20 06:50:19,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4701400.0, ans=0.125 2024-08-20 06:50:24,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4701400.0, ans=0.125 2024-08-20 06:50:59,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4701600.0, ans=0.1 2024-08-20 06:51:01,874 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 06:51:04,586 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.98 vs. limit=22.5 2024-08-20 06:51:12,130 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 06:51:12,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4701700.0, ans=0.05 2024-08-20 06:51:29,641 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10800, loss[loss=0.1067, beats_loss=0.01065, ecapa_loss=0.000163, whisper_loss=0.09443, over 20393.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01064, ecapa_loss=0.000138, whisper_loss=0.0883, over 3784074.42 frames. ], batch size: 84, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:51:32,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4701800.0, ans=0.0 2024-08-20 06:51:34,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4701800.0, ans=0.125 2024-08-20 06:51:52,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4701900.0, ans=0.0 2024-08-20 06:51:53,890 INFO [train_multi_KD3.py:845] (1/4) A total of 97 cuts. 34 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 06:52:00,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4701900.0, ans=0.0 2024-08-20 06:52:08,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4702000.0, ans=0.125 2024-08-20 06:52:11,639 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 15 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 06:52:28,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4702100.0, ans=0.0 2024-08-20 06:52:42,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4702100.0, ans=0.0 2024-08-20 06:52:48,869 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 06:53:08,926 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10850, loss[loss=0.111, beats_loss=0.006679, ecapa_loss=0.0001701, whisper_loss=0.1026, over 22143.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001377, whisper_loss=0.08929, over 3803220.41 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:53:21,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-08-20 06:53:30,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.288e+01 2.456e+01 2.756e+01 3.873e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-20 06:53:32,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4702400.0, ans=0.0 2024-08-20 06:53:39,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4702400.0, ans=0.125 2024-08-20 06:53:51,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4702500.0, ans=0.125 2024-08-20 06:54:00,973 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:54:04,988 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 06:54:21,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4702600.0, ans=0.0 2024-08-20 06:54:45,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4702700.0, ans=0.125 2024-08-20 06:54:46,423 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 06:54:48,405 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10900, loss[loss=0.118, beats_loss=0.009697, ecapa_loss=0.0001585, whisper_loss=0.1067, over 16824.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001393, whisper_loss=0.09037, over 3813663.51 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:54:48,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4702800.0, ans=0.1 2024-08-20 06:55:22,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=10.0 2024-08-20 06:55:23,798 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 06:55:31,180 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 06:55:40,414 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.87 vs. limit=8.0 2024-08-20 06:56:00,865 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 06:56:25,276 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 10950, loss[loss=0.1126, beats_loss=0.009649, ecapa_loss=0.0001636, whisper_loss=0.1014, over 22963.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09047, over 3836130.90 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:56:30,451 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 06:56:34,147 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 06:56:47,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.264e+01 2.421e+01 2.646e+01 4.130e+01, threshold=4.843e+01, percent-clipped=0.0 2024-08-20 06:56:56,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4703400.0, ans=0.125 2024-08-20 06:56:57,325 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04336608201265335, model_norm_threshold=48.42934799194336 2024-08-20 06:56:57,482 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.187e+05, grad_sumsq=2.051e+07, orig_rms_sq=1.067e-02 2024-08-20 06:57:05,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4703500.0, ans=0.2 2024-08-20 06:57:09,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4703500.0, ans=0.0 2024-08-20 06:57:13,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4703500.0, ans=0.0 2024-08-20 06:57:38,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2024-08-20 06:57:43,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4703700.0, ans=0.125 2024-08-20 06:57:56,659 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11000, loss[loss=0.1165, beats_loss=0.0108, ecapa_loss=0.0001129, whisper_loss=0.1046, over 14879.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.09117, over 3833032.19 frames. ], batch size: 56, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:57:57,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-20 06:58:03,354 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06608612835407257, model_norm_threshold=48.42934799194336 2024-08-20 06:58:03,512 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.920e+04, grad_sumsq=8.920e+04, orig_rms_sq=1.000e+00 2024-08-20 06:58:03,807 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 31 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 06:58:27,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-08-20 06:58:28,331 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4703900.0, ans=0.1 2024-08-20 06:58:38,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4704000.0, ans=0.2 2024-08-20 06:58:43,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4704000.0, ans=0.2 2024-08-20 06:58:47,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4704100.0, ans=0.125 2024-08-20 06:58:49,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4704100.0, ans=0.0 2024-08-20 06:58:52,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4704100.0, ans=0.125 2024-08-20 06:58:55,323 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 06:58:57,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4704100.0, ans=0.1 2024-08-20 06:59:04,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4704200.0, ans=0.2 2024-08-20 06:59:21,962 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 06:59:23,039 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11050, loss[loss=0.1075, beats_loss=0.01078, ecapa_loss=0.0001206, whisper_loss=0.09556, over 17239.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01039, ecapa_loss=0.0001394, whisper_loss=0.09147, over 3839435.17 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:59:43,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.551e+01 2.950e+01 1.117e+03, threshold=5.103e+01, percent-clipped=5.0 2024-08-20 07:00:02,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4704500.0, ans=0.125 2024-08-20 07:00:07,560 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-20 07:00:12,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4704500.0, ans=0.125 2024-08-20 07:00:18,402 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.507e-01 2024-08-20 07:00:23,680 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 07:00:25,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4704600.0, ans=0.125 2024-08-20 07:00:57,183 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11100, loss[loss=0.09609, beats_loss=0.01274, ecapa_loss=0.0001208, whisper_loss=0.08215, over 22816.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.09133, over 3838849.06 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:01:04,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4704800.0, ans=0.0 2024-08-20 07:01:27,360 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 33 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 07:01:33,536 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 07:01:58,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4705100.0, ans=0.125 2024-08-20 07:02:01,629 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 07:02:12,792 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 33 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 07:02:16,317 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 07:02:19,993 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 07:02:32,307 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 19 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-20 07:02:35,154 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11150, loss[loss=0.1102, beats_loss=0.0103, ecapa_loss=0.0001017, whisper_loss=0.09888, over 14745.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.09142, over 3872042.89 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:02:35,415 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 20 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 07:02:40,873 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-20 07:02:44,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4705300.0, ans=0.0 2024-08-20 07:02:44,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2024-08-20 07:02:58,764 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 07:02:59,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.354e+01 2.488e+01 2.886e+01 1.211e+02, threshold=4.976e+01, percent-clipped=2.0 2024-08-20 07:03:08,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4705400.0, ans=0.125 2024-08-20 07:03:24,009 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 07:03:43,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-20 07:04:01,764 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09045316278934479, model_norm_threshold=49.755611419677734 2024-08-20 07:04:01,924 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.284e+04, grad_sumsq=3.284e+04, orig_rms_sq=1.000e+00 2024-08-20 07:04:21,343 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11200, loss[loss=0.102, beats_loss=0.009922, ecapa_loss=0.0001504, whisper_loss=0.09056, over 23398.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.09128, over 3910341.07 frames. ], batch size: 96, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:04:32,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4705800.0, ans=0.125 2024-08-20 07:04:36,572 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 20 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-20 07:04:36,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4705800.0, ans=0.125 2024-08-20 07:04:46,175 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 07:04:52,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4705900.0, ans=0.125 2024-08-20 07:04:58,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4705900.0, ans=10.0 2024-08-20 07:04:59,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-20 07:05:14,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4706000.0, ans=0.0 2024-08-20 07:05:16,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=12.0 2024-08-20 07:05:29,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-20 07:05:47,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4706200.0, ans=0.0 2024-08-20 07:05:47,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4706200.0, ans=0.0 2024-08-20 07:05:56,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4706200.0, ans=0.0 2024-08-20 07:06:00,560 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11250, loss[loss=0.1033, beats_loss=0.01225, ecapa_loss=0.0001231, whisper_loss=0.08986, over 21729.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001394, whisper_loss=0.09021, over 3910458.92 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:06:23,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.303e+01 2.565e+01 2.928e+01 5.501e+02, threshold=5.130e+01, percent-clipped=1.0 2024-08-20 07:06:28,565 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 07:06:33,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4706400.0, ans=0.2 2024-08-20 07:06:45,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4706500.0, ans=0.0 2024-08-20 07:06:49,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4706500.0, ans=0.125 2024-08-20 07:07:20,409 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 33 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 07:07:39,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4706800.0, ans=0.0 2024-08-20 07:07:40,632 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11300, loss[loss=0.1024, beats_loss=0.01007, ecapa_loss=0.0001583, whisper_loss=0.09075, over 22801.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001399, whisper_loss=0.09005, over 3934448.34 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:07:48,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4706800.0, ans=0.125 2024-08-20 07:07:53,649 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 24 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 07:08:00,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4706900.0, ans=0.1 2024-08-20 07:08:11,035 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 07:08:16,281 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 07:08:39,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4707100.0, ans=0.1 2024-08-20 07:08:46,358 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 12 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 07:08:46,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4707100.0, ans=0.125 2024-08-20 07:08:46,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4707100.0, ans=0.125 2024-08-20 07:08:48,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2024-08-20 07:09:15,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4707300.0, ans=0.2 2024-08-20 07:09:16,184 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11350, loss[loss=0.09893, beats_loss=0.01108, ecapa_loss=0.000136, whisper_loss=0.0865, over 22518.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.09036, over 3897918.10 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:09:36,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4707400.0, ans=0.125 2024-08-20 07:09:36,765 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.220e+01 2.467e+01 2.786e+01 5.186e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-20 07:10:02,319 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 07:10:21,006 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 07:10:25,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4707600.0, ans=0.0 2024-08-20 07:10:49,679 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11400, loss[loss=0.08037, beats_loss=0.01095, ecapa_loss=0.0001762, whisper_loss=0.06766, over 13593.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.09058, over 3871978.83 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:11:06,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4707800.0, ans=0.5 2024-08-20 07:11:47,852 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 07:12:09,297 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 07:12:15,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4708200.0, ans=0.1 2024-08-20 07:12:18,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4708200.0, ans=0.125 2024-08-20 07:12:19,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2024-08-20 07:12:23,558 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11450, loss[loss=0.1001, beats_loss=0.01126, ecapa_loss=0.000133, whisper_loss=0.08747, over 21759.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001395, whisper_loss=0.09064, over 3860892.53 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:12:45,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.318e+01 2.506e+01 2.910e+01 4.315e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-20 07:12:48,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4708400.0, ans=0.015 2024-08-20 07:13:21,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-08-20 07:13:29,602 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 07:13:41,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4708700.0, ans=0.07 2024-08-20 07:13:46,024 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 07:13:49,941 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 07:13:58,861 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11500, loss[loss=0.1069, beats_loss=0.0116, ecapa_loss=0.0001304, whisper_loss=0.094, over 23060.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001397, whisper_loss=0.08973, over 3879085.96 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:14:00,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4708800.0, ans=0.125 2024-08-20 07:14:07,921 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 07:14:10,044 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 07:14:21,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4708900.0, ans=0.125 2024-08-20 07:14:29,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4708900.0, ans=0.04949747468305833 2024-08-20 07:14:30,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4708900.0, ans=0.125 2024-08-20 07:14:38,654 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 07:14:38,870 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4709000.0, ans=0.125 2024-08-20 07:14:51,236 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 07:15:05,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4709100.0, ans=0.0 2024-08-20 07:15:39,198 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11550, loss[loss=0.09699, beats_loss=0.01063, ecapa_loss=0.0001276, whisper_loss=0.08508, over 16550.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001386, whisper_loss=0.09062, over 3856500.10 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:15:47,995 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 13 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 07:16:01,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.302e+01 2.543e+01 2.812e+01 2.319e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-20 07:16:19,600 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 07:16:27,179 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 07:16:44,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4709600.0, ans=0.125 2024-08-20 07:16:56,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4709600.0, ans=0.125 2024-08-20 07:17:25,997 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11600, loss[loss=0.08595, beats_loss=0.01147, ecapa_loss=0.0001629, whisper_loss=0.07286, over 17323.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.09094, over 3880456.99 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:17:32,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4709800.0, ans=0.09899494936611666 2024-08-20 07:17:45,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4709900.0, ans=0.1 2024-08-20 07:17:51,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4709900.0, ans=0.125 2024-08-20 07:17:57,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4709900.0, ans=0.1 2024-08-20 07:17:57,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4709900.0, ans=0.125 2024-08-20 07:19:00,894 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4710200.0, ans=0.125 2024-08-20 07:19:09,675 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11650, loss[loss=0.08228, beats_loss=0.01154, ecapa_loss=0.0001583, whisper_loss=0.06915, over 12835.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001392, whisper_loss=0.09114, over 3831667.08 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:19:33,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.347e+01 2.610e+01 2.991e+01 4.037e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-20 07:20:10,397 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 10 from Vox, 40 fro AS 2024-08-20 07:20:28,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4710600.0, ans=0.1 2024-08-20 07:20:51,475 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11700, loss[loss=0.1156, beats_loss=0.009643, ecapa_loss=0.0001472, whisper_loss=0.1045, over 14427.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.09097, over 3817081.01 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:20:52,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4710800.0, ans=0.125 2024-08-20 07:21:12,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4710900.0, ans=0.125 2024-08-20 07:21:14,547 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 07:21:25,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4710900.0, ans=0.125 2024-08-20 07:21:26,505 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 07:21:37,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-08-20 07:21:40,735 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 26 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 07:21:43,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4711000.0, ans=0.0 2024-08-20 07:22:04,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4711100.0, ans=0.125 2024-08-20 07:22:32,001 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11750, loss[loss=0.1119, beats_loss=0.01107, ecapa_loss=0.0001077, whisper_loss=0.09976, over 22789.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001386, whisper_loss=0.09134, over 3828197.63 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:22:55,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.306e+01 2.569e+01 2.913e+01 4.079e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-20 07:23:34,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-08-20 07:23:51,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4711600.0, ans=0.125 2024-08-20 07:24:01,552 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 07:24:05,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4711700.0, ans=0.0 2024-08-20 07:24:10,022 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 07:24:14,561 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 07:24:18,354 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11800, loss[loss=0.0937, beats_loss=0.01218, ecapa_loss=0.0001212, whisper_loss=0.08031, over 18160.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001402, whisper_loss=0.09143, over 3857017.41 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:24:21,708 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 07:25:04,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4712000.0, ans=0.125 2024-08-20 07:25:24,609 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 07:25:37,082 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 32 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 07:25:37,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4712100.0, ans=0.125 2024-08-20 07:25:41,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4712200.0, ans=0.95 2024-08-20 07:25:47,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-20 07:25:59,668 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 07:26:02,563 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11850, loss[loss=0.09165, beats_loss=0.01218, ecapa_loss=0.0001068, whisper_loss=0.0784, over 21996.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.09113, over 3858157.37 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:26:04,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4712300.0, ans=0.0 2024-08-20 07:26:26,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.335e+01 2.520e+01 2.883e+01 3.441e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-20 07:26:27,740 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 07:26:33,711 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4712400.0, ans=0.0 2024-08-20 07:26:42,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4712400.0, ans=0.2 2024-08-20 07:27:16,384 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 07:27:25,241 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 40 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-20 07:27:39,390 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-20 07:27:46,175 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11900, loss[loss=0.07889, beats_loss=0.01269, ecapa_loss=0.0001185, whisper_loss=0.06501, over 16553.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.09065, over 3864271.65 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:27:48,776 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 07:28:03,361 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 28 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 07:28:04,875 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 07:28:27,080 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-20 07:28:55,588 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.353e-01 2024-08-20 07:29:04,689 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 07:29:23,019 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 11950, loss[loss=0.09065, beats_loss=0.01246, ecapa_loss=0.000115, whisper_loss=0.07704, over 20972.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.09034, over 3832318.16 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:29:23,371 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 07:29:25,728 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 07:29:29,783 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4713300.0, ans=0.2 2024-08-20 07:29:43,869 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.385e+01 2.581e+01 2.869e+01 3.753e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 07:29:58,059 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 07:30:06,886 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-20 07:30:20,108 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-20 07:30:33,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-20 07:30:37,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=8.0 2024-08-20 07:30:46,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-08-20 07:30:50,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4713700.0, ans=0.125 2024-08-20 07:30:56,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4713800.0, ans=0.0 2024-08-20 07:30:57,729 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12000, loss[loss=0.1073, beats_loss=0.007864, ecapa_loss=0.0001564, whisper_loss=0.09787, over 17188.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08977, over 3819878.95 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:30:57,729 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 07:31:33,865 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005087, whisper_loss=0.2481, over 931116.00 frames. 2024-08-20 07:31:55,998 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on SV_voxceleb1: loss=0.003908, beats_loss=0, ecapa_loss=0.0003908, whisper_loss=0, over 944235.00 frames. 2024-08-20 07:33:38,226 INFO [train_multi_KD3.py:1150] (1/4) Epoch 32, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 07:33:38,230 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 07:34:01,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4713900.0, ans=0.95 2024-08-20 07:34:05,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4713900.0, ans=0.07 2024-08-20 07:34:15,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4714000.0, ans=0.1 2024-08-20 07:34:20,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4714000.0, ans=0.125 2024-08-20 07:34:20,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-20 07:34:22,159 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 07:34:29,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4714000.0, ans=0.125 2024-08-20 07:35:03,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4714200.0, ans=0.0 2024-08-20 07:35:06,512 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12050, loss[loss=0.1018, beats_loss=0.009444, ecapa_loss=0.0001396, whisper_loss=0.09092, over 22913.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.08964, over 3817775.93 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:35:07,581 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2024-08-20 07:35:12,050 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 07:35:14,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-08-20 07:35:14,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4714300.0, ans=0.125 2024-08-20 07:35:20,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2024-08-20 07:35:25,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.124e+01 2.460e+01 2.784e+01 4.386e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-20 07:35:39,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4714400.0, ans=0.125 2024-08-20 07:36:11,796 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 07:36:39,080 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12100, loss[loss=0.09713, beats_loss=0.01185, ecapa_loss=0.0001625, whisper_loss=0.08366, over 21490.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.08883, over 3830631.33 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:36:41,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4714800.0, ans=0.0 2024-08-20 07:36:43,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4714800.0, ans=0.125 2024-08-20 07:37:22,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-20 07:37:38,489 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=12.0 2024-08-20 07:38:12,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4715200.0, ans=0.0 2024-08-20 07:38:23,913 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12150, loss[loss=0.1398, beats_loss=0.005821, ecapa_loss=0.0001487, whisper_loss=0.1325, over 20605.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.000142, whisper_loss=0.08984, over 3849972.31 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:38:47,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.297e+01 2.567e+01 2.958e+01 5.999e+01, threshold=5.133e+01, percent-clipped=2.0 2024-08-20 07:38:53,397 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-20 07:39:09,780 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 07:39:14,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4715500.0, ans=0.07 2024-08-20 07:39:16,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4715500.0, ans=0.125 2024-08-20 07:39:27,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4715600.0, ans=0.125 2024-08-20 07:39:33,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4715600.0, ans=0.125 2024-08-20 07:39:38,218 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 07:39:42,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-20 07:39:54,759 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 9 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 07:39:58,961 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12200, loss[loss=0.1091, beats_loss=0.01015, ecapa_loss=0.0001049, whisper_loss=0.09789, over 19026.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001413, whisper_loss=0.08967, over 3811005.13 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:40:07,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4715800.0, ans=0.1 2024-08-20 07:40:15,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4715900.0, ans=0.07 2024-08-20 07:40:19,347 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 07:40:20,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4715900.0, ans=0.125 2024-08-20 07:40:21,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-08-20 07:40:42,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4716000.0, ans=0.2 2024-08-20 07:40:49,812 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 07:40:51,688 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 07:40:52,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4716100.0, ans=0.125 2024-08-20 07:41:03,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-20 07:41:23,375 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2024-08-20 07:41:25,576 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12250, loss[loss=0.09974, beats_loss=0.007934, ecapa_loss=0.0001564, whisper_loss=0.09024, over 22305.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001417, whisper_loss=0.08916, over 3770569.59 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:41:31,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4716300.0, ans=0.125 2024-08-20 07:41:40,833 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 27 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 07:41:45,067 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.278e+01 2.601e+01 2.929e+01 4.392e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-20 07:41:46,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4716400.0, ans=0.2 2024-08-20 07:42:07,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=4716500.0, ans=0.2 2024-08-20 07:42:19,537 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 33 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 07:42:21,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4716600.0, ans=0.0 2024-08-20 07:42:24,747 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 07:42:25,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4716600.0, ans=0.125 2024-08-20 07:42:25,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4716600.0, ans=0.125 2024-08-20 07:42:26,610 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 22 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-20 07:42:38,261 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 21 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-20 07:42:52,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4716800.0, ans=0.0 2024-08-20 07:42:53,964 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12300, loss[loss=0.1206, beats_loss=0.0085, ecapa_loss=0.0001372, whisper_loss=0.1107, over 24059.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001425, whisper_loss=0.08977, over 3830523.49 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:42:55,722 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 07:43:03,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4716800.0, ans=0.125 2024-08-20 07:43:42,223 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 07:44:23,196 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12350, loss[loss=0.1001, beats_loss=0.01131, ecapa_loss=0.0001531, whisper_loss=0.08725, over 21315.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.000142, whisper_loss=0.08948, over 3822581.29 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:44:23,427 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 07:44:23,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4717300.0, ans=0.0 2024-08-20 07:44:41,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4717400.0, ans=0.2 2024-08-20 07:44:44,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.339e+01 2.566e+01 2.988e+01 1.086e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-20 07:44:44,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=12.0 2024-08-20 07:44:49,745 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 25 from LS+wenet, 12 from Vox, 13 fro AS 2024-08-20 07:45:09,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4717500.0, ans=0.1 2024-08-20 07:45:18,176 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 07:45:24,095 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 07:45:26,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4717600.0, ans=0.1 2024-08-20 07:45:35,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4717700.0, ans=0.0 2024-08-20 07:45:37,593 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 31 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 07:45:44,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4717700.0, ans=0.0 2024-08-20 07:45:46,596 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 07:45:49,967 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 29 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 07:45:53,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4717800.0, ans=0.0 2024-08-20 07:45:55,823 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12400, loss[loss=0.09944, beats_loss=0.01209, ecapa_loss=0.0001127, whisper_loss=0.08622, over 18977.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.08959, over 3805742.61 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:46:01,435 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 07:46:13,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4717900.0, ans=0.2 2024-08-20 07:46:20,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4717900.0, ans=0.0 2024-08-20 07:46:24,304 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.912e+01 2024-08-20 07:46:26,610 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-08-20 07:46:50,232 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:46:51,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4718100.0, ans=0.1 2024-08-20 07:47:06,582 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-20 07:47:15,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4718200.0, ans=0.125 2024-08-20 07:47:24,631 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12450, loss[loss=0.11, beats_loss=0.00918, ecapa_loss=0.0001569, whisper_loss=0.09928, over 21673.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.08951, over 3844658.15 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:47:24,809 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 07:47:43,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.285e+01 2.465e+01 2.724e+01 4.543e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-20 07:47:51,533 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 21 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 07:47:53,475 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 07:47:57,185 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 07:48:05,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4718500.0, ans=0.0 2024-08-20 07:48:29,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4718600.0, ans=0.05 2024-08-20 07:48:34,346 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 07:48:38,417 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-20 07:48:48,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4718700.0, ans=6.0 2024-08-20 07:48:58,438 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12500, loss[loss=0.08983, beats_loss=0.01067, ecapa_loss=0.0001372, whisper_loss=0.07779, over 13972.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001399, whisper_loss=0.08929, over 3822834.93 frames. ], batch size: 54, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:50:02,242 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 07:50:11,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4719200.0, ans=0.125 2024-08-20 07:50:28,507 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12550, loss[loss=0.08209, beats_loss=0.01118, ecapa_loss=0.0001385, whisper_loss=0.06952, over 21636.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.0893, over 3855636.11 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:50:32,925 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-08-20 07:50:48,468 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.246e+01 2.517e+01 2.953e+01 4.531e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 07:51:00,636 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=22.5 2024-08-20 07:51:05,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-08-20 07:51:09,395 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 07:51:11,022 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-20 07:51:18,286 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:51:32,086 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-20 07:51:39,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4719700.0, ans=0.05 2024-08-20 07:51:59,658 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12600, loss[loss=0.09862, beats_loss=0.009711, ecapa_loss=0.0001292, whisper_loss=0.08762, over 15628.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08977, over 3867156.11 frames. ], batch size: 61, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:52:18,825 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 19 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-20 07:52:32,530 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 28 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 07:52:34,856 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 07:52:43,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4720000.0, ans=0.125 2024-08-20 07:52:43,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4720000.0, ans=0.0 2024-08-20 07:53:18,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4720200.0, ans=0.125 2024-08-20 07:53:18,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4720200.0, ans=0.1 2024-08-20 07:53:33,458 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12650, loss[loss=0.08329, beats_loss=0.01296, ecapa_loss=0.0001201, whisper_loss=0.06913, over 21924.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001396, whisper_loss=0.08912, over 3847203.88 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:53:37,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4720300.0, ans=0.125 2024-08-20 07:53:39,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4720300.0, ans=0.0 2024-08-20 07:53:39,591 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-20 07:53:43,436 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.87 vs. limit=5.0 2024-08-20 07:53:53,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.384e+01 2.676e+01 2.977e+01 1.190e+02, threshold=5.353e+01, percent-clipped=5.0 2024-08-20 07:53:56,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4720400.0, ans=0.125 2024-08-20 07:54:22,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4720500.0, ans=0.0 2024-08-20 07:54:39,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4720600.0, ans=0.125 2024-08-20 07:54:44,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4720700.0, ans=0.125 2024-08-20 07:55:02,705 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12700, loss[loss=0.1134, beats_loss=0.009174, ecapa_loss=0.0001438, whisper_loss=0.1028, over 17725.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08928, over 3857491.35 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:55:05,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-20 07:55:11,789 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 07:55:56,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4721100.0, ans=0.5 2024-08-20 07:56:07,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4721100.0, ans=0.125 2024-08-20 07:56:19,081 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 23 from LS+wenet, 13 from Vox, 15 fro AS 2024-08-20 07:56:27,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-20 07:56:34,115 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12750, loss[loss=0.1034, beats_loss=0.01159, ecapa_loss=0.000131, whisper_loss=0.09055, over 21832.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001387, whisper_loss=0.08991, over 3857427.36 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:56:34,547 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 07:56:52,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.492e+01 2.698e+01 4.644e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 07:56:56,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4721400.0, ans=0.125 2024-08-20 07:57:04,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.80 vs. limit=22.5 2024-08-20 07:57:07,392 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 07:57:13,133 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 19 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-20 07:57:34,790 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 07:57:57,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4721700.0, ans=0.09899494936611666 2024-08-20 07:58:01,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.13 vs. limit=22.5 2024-08-20 07:58:02,309 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12800, loss[loss=0.1114, beats_loss=0.01218, ecapa_loss=0.0001344, whisper_loss=0.09786, over 21186.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001399, whisper_loss=0.09037, over 3837468.83 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:58:11,831 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 16 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 07:58:13,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4721800.0, ans=0.1 2024-08-20 07:58:24,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4721900.0, ans=0.0 2024-08-20 07:58:30,192 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4721900.0, ans=0.125 2024-08-20 07:58:30,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4721900.0, ans=0.125 2024-08-20 07:58:32,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-20 07:58:35,802 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 07:58:53,119 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-20 07:59:11,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4722100.0, ans=0.125 2024-08-20 07:59:14,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4722100.0, ans=0.125 2024-08-20 07:59:28,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4722200.0, ans=0.1 2024-08-20 07:59:36,532 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12850, loss[loss=0.1109, beats_loss=0.01025, ecapa_loss=0.0001202, whisper_loss=0.09949, over 21407.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08979, over 3858827.45 frames. ], batch size: 85, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:59:42,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4722300.0, ans=0.125 2024-08-20 07:59:56,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.213e+01 2.468e+01 2.785e+01 8.515e+01, threshold=4.935e+01, percent-clipped=2.0 2024-08-20 07:59:58,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4722400.0, ans=0.1 2024-08-20 08:00:02,991 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 08:00:11,420 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 08:00:18,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4722500.0, ans=0.125 2024-08-20 08:00:44,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4722600.0, ans=0.125 2024-08-20 08:00:58,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4722700.0, ans=0.125 2024-08-20 08:01:00,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4722700.0, ans=0.0 2024-08-20 08:01:04,942 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12900, loss[loss=0.1127, beats_loss=0.0106, ecapa_loss=0.0001284, whisper_loss=0.1008, over 20546.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.09003, over 3848807.07 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:01:22,077 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 08:01:33,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4722900.0, ans=0.125 2024-08-20 08:02:01,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4723100.0, ans=0.05 2024-08-20 08:02:18,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4723200.0, ans=0.0 2024-08-20 08:02:35,022 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 12950, loss[loss=0.09431, beats_loss=0.006467, ecapa_loss=0.0001959, whisper_loss=0.08589, over 12512.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.08995, over 3825159.13 frames. ], batch size: 52, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:02:39,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-20 08:02:56,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.320e+01 2.461e+01 2.898e+01 1.890e+02, threshold=4.922e+01, percent-clipped=4.0 2024-08-20 08:02:57,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2024-08-20 08:03:05,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4723400.0, ans=0.125 2024-08-20 08:03:11,262 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 08:03:36,550 WARNING [optim.py:496] (1/4) Scaling gradients by 0.012246229685842991, model_norm_threshold=49.221561431884766 2024-08-20 08:03:36,709 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.214e+06, grad_sumsq=3.009e+08, orig_rms_sq=1.068e-02 2024-08-20 08:03:42,753 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4723600.0, ans=0.0 2024-08-20 08:03:55,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4723700.0, ans=0.0 2024-08-20 08:04:07,920 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13000, loss[loss=0.08602, beats_loss=0.01173, ecapa_loss=0.0001377, whisper_loss=0.07291, over 18067.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.09044, over 3829187.60 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:04:29,057 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 08:04:35,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4723900.0, ans=0.125 2024-08-20 08:04:41,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-08-20 08:04:42,081 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.088e-02 2024-08-20 08:04:43,646 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 08:04:48,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4724000.0, ans=0.1 2024-08-20 08:05:22,987 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 08:05:23,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4724200.0, ans=0.09899494936611666 2024-08-20 08:05:29,197 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:05:31,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.16 vs. limit=10.0 2024-08-20 08:05:41,805 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13050, loss[loss=0.08534, beats_loss=0.008667, ecapa_loss=0.000179, whisper_loss=0.07488, over 12407.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001416, whisper_loss=0.09029, over 3828189.91 frames. ], batch size: 51, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:05:45,790 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03136618062853813, model_norm_threshold=49.221561431884766 2024-08-20 08:05:45,947 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.784e+05, grad_sumsq=1.450e+05, orig_rms_sq=3.300e+00 2024-08-20 08:05:46,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4724300.0, ans=0.125 2024-08-20 08:05:47,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4724300.0, ans=0.125 2024-08-20 08:06:03,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.318e+01 2.513e+01 2.841e+01 4.019e+03, threshold=5.026e+01, percent-clipped=3.0 2024-08-20 08:06:09,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4724400.0, ans=0.0 2024-08-20 08:06:16,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2024-08-20 08:06:18,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4724400.0, ans=0.125 2024-08-20 08:06:22,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4724500.0, ans=0.1 2024-08-20 08:06:38,968 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 08:06:56,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4724600.0, ans=0.0 2024-08-20 08:07:12,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4724700.0, ans=0.1 2024-08-20 08:07:21,153 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13100, loss[loss=0.09999, beats_loss=0.01188, ecapa_loss=0.0001632, whisper_loss=0.08648, over 21211.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.09017, over 3834876.34 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:07:23,605 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-20 08:07:41,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=15.0 2024-08-20 08:07:44,676 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 23 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-20 08:07:45,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-20 08:07:46,780 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-20 08:08:01,360 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 08:08:03,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4725000.0, ans=0.07 2024-08-20 08:08:05,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4725000.0, ans=0.2 2024-08-20 08:08:08,626 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=15.0 2024-08-20 08:08:21,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4725100.0, ans=0.125 2024-08-20 08:08:41,551 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 08:08:45,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4725200.0, ans=0.0 2024-08-20 08:08:54,638 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13150, loss[loss=0.0876, beats_loss=0.01115, ecapa_loss=0.0001231, whisper_loss=0.07522, over 18457.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001401, whisper_loss=0.0891, over 3834788.71 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:09:08,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4725300.0, ans=0.125 2024-08-20 08:09:10,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4725300.0, ans=0.125 2024-08-20 08:09:11,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=12.0 2024-08-20 08:09:16,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+01 2.265e+01 2.500e+01 2.860e+01 8.543e+01, threshold=5.000e+01, percent-clipped=2.0 2024-08-20 08:09:28,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4725400.0, ans=0.125 2024-08-20 08:09:28,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4725400.0, ans=0.125 2024-08-20 08:09:41,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4725500.0, ans=0.125 2024-08-20 08:09:42,242 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2024-08-20 08:09:43,721 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 08:09:49,529 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.123e+05 2024-08-20 08:09:51,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4725600.0, ans=0.1 2024-08-20 08:09:56,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-08-20 08:10:11,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4725700.0, ans=0.0 2024-08-20 08:10:18,092 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 08:10:30,333 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 08:10:31,884 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13200, loss[loss=0.1169, beats_loss=0.009301, ecapa_loss=0.0001533, whisper_loss=0.106, over 22358.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001402, whisper_loss=0.08916, over 3828550.62 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:11:01,395 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:11:11,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4726000.0, ans=0.125 2024-08-20 08:11:28,511 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 33 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 08:11:41,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4726100.0, ans=0.125 2024-08-20 08:12:04,723 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:12:05,810 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13250, loss[loss=0.117, beats_loss=0.009354, ecapa_loss=0.0001402, whisper_loss=0.1063, over 23440.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001401, whisper_loss=0.08914, over 3815588.02 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:12:26,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.372e+01 2.599e+01 3.015e+01 7.004e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-20 08:12:36,155 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 08:13:03,617 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 35 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 08:13:12,708 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 26 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 08:13:13,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4726600.0, ans=0.125 2024-08-20 08:13:36,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4726700.0, ans=0.125 2024-08-20 08:13:38,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4726700.0, ans=0.2 2024-08-20 08:13:40,488 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13300, loss[loss=0.09666, beats_loss=0.01023, ecapa_loss=0.0001338, whisper_loss=0.0851, over 23054.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.08982, over 3814696.77 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:13:47,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4726800.0, ans=0.0 2024-08-20 08:13:50,923 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.162e+01 2024-08-20 08:14:01,264 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-20 08:14:03,207 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 08:14:17,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=4727000.0, ans=0.1 2024-08-20 08:14:32,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4727000.0, ans=0.125 2024-08-20 08:14:50,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4727100.0, ans=0.2 2024-08-20 08:15:14,034 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13350, loss[loss=0.09282, beats_loss=0.009673, ecapa_loss=0.0001469, whisper_loss=0.08168, over 20878.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001414, whisper_loss=0.08917, over 3791744.49 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:15:33,504 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.51 vs. limit=22.5 2024-08-20 08:15:34,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.328e+01 2.528e+01 2.746e+01 3.166e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-20 08:15:42,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4727400.0, ans=0.0 2024-08-20 08:15:44,033 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4727400.0, ans=0.0 2024-08-20 08:15:44,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4727400.0, ans=0.2 2024-08-20 08:15:52,844 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 08:15:55,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4727500.0, ans=0.125 2024-08-20 08:16:22,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4727600.0, ans=0.125 2024-08-20 08:16:33,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4727700.0, ans=10.0 2024-08-20 08:16:41,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-08-20 08:16:46,248 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13400, loss[loss=0.0776, beats_loss=0.01204, ecapa_loss=0.0001386, whisper_loss=0.06418, over 12278.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001413, whisper_loss=0.08947, over 3803100.08 frames. ], batch size: 51, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:16:54,712 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-20 08:17:08,372 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 08:17:09,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.26 vs. limit=22.5 2024-08-20 08:17:11,944 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 15 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 08:17:19,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4727900.0, ans=0.1 2024-08-20 08:17:26,436 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 33 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 08:17:28,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4728000.0, ans=0.125 2024-08-20 08:17:43,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=22.5 2024-08-20 08:18:17,770 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13450, loss[loss=0.1206, beats_loss=0.008432, ecapa_loss=0.0001704, whisper_loss=0.1105, over 18197.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.08944, over 3811343.58 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:18:39,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.294e+01 2.576e+01 2.808e+01 3.727e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-20 08:18:44,444 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-20 08:18:53,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4728500.0, ans=0.2 2024-08-20 08:19:14,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4728600.0, ans=0.0 2024-08-20 08:19:20,015 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 08:19:29,977 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.095e+05 2024-08-20 08:19:51,640 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13500, loss[loss=0.1148, beats_loss=0.009468, ecapa_loss=0.000124, whisper_loss=0.1041, over 16966.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001409, whisper_loss=0.08997, over 3826772.49 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:19:59,351 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-20 08:20:12,728 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 08:20:15,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4728900.0, ans=0.125 2024-08-20 08:20:27,702 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4729000.0, ans=0.0 2024-08-20 08:20:35,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4729000.0, ans=0.0 2024-08-20 08:21:18,404 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.948e+00 2024-08-20 08:21:21,211 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-20 08:21:31,022 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13550, loss[loss=0.09856, beats_loss=0.00993, ecapa_loss=0.0001529, whisper_loss=0.0871, over 21820.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001405, whisper_loss=0.09002, over 3822995.75 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:21:42,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4729300.0, ans=0.1 2024-08-20 08:21:47,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-20 08:21:52,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.276e+01 2.469e+01 2.814e+01 4.223e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-20 08:21:52,218 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 08:22:02,313 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-20 08:22:27,929 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.436e+00 2024-08-20 08:22:32,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-20 08:22:42,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-20 08:22:59,886 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 08:23:05,657 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13600, loss[loss=0.1104, beats_loss=0.01081, ecapa_loss=0.0001192, whisper_loss=0.09839, over 13684.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001414, whisper_loss=0.09009, over 3803681.70 frames. ], batch size: 53, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:23:14,830 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 08:23:22,725 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 08:23:46,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4730000.0, ans=0.2 2024-08-20 08:24:11,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4730100.0, ans=15.0 2024-08-20 08:24:22,562 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 08:24:29,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4730200.0, ans=0.1 2024-08-20 08:24:43,162 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13650, loss[loss=0.106, beats_loss=0.009485, ecapa_loss=0.0001496, whisper_loss=0.09499, over 22808.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.08992, over 3797724.61 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:24:43,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4730300.0, ans=0.1 2024-08-20 08:24:45,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4730300.0, ans=10.0 2024-08-20 08:24:49,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4730300.0, ans=0.2 2024-08-20 08:24:56,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4730300.0, ans=0.05 2024-08-20 08:25:03,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.366e+01 2.588e+01 2.806e+01 4.523e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-20 08:25:07,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4730400.0, ans=0.04949747468305833 2024-08-20 08:25:17,004 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 17 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-20 08:25:27,722 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 08:25:27,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4730500.0, ans=0.125 2024-08-20 08:25:34,698 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 08:25:49,521 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 08:25:55,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4730700.0, ans=0.1 2024-08-20 08:26:12,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4730700.0, ans=0.125 2024-08-20 08:26:15,096 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13700, loss[loss=0.09526, beats_loss=0.01013, ecapa_loss=0.0001428, whisper_loss=0.0837, over 21973.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01021, ecapa_loss=0.0001421, whisper_loss=0.09136, over 3821774.74 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:26:25,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4730800.0, ans=0.1 2024-08-20 08:26:33,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4730900.0, ans=0.125 2024-08-20 08:26:40,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4730900.0, ans=0.125 2024-08-20 08:26:52,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4731000.0, ans=0.2 2024-08-20 08:27:00,821 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 08:27:06,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4731000.0, ans=0.0 2024-08-20 08:27:16,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4731100.0, ans=0.125 2024-08-20 08:27:29,718 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 08:27:46,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2024-08-20 08:27:48,697 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13750, loss[loss=0.1081, beats_loss=0.009976, ecapa_loss=0.0001244, whisper_loss=0.09688, over 23383.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01026, ecapa_loss=0.0001419, whisper_loss=0.0914, over 3839552.82 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:27:53,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4731300.0, ans=0.0 2024-08-20 08:27:54,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4731300.0, ans=0.125 2024-08-20 08:28:00,021 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 08:28:10,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.242e+01 2.511e+01 2.832e+01 4.850e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 08:28:17,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4731400.0, ans=0.0 2024-08-20 08:28:18,720 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 08:28:40,492 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-20 08:29:22,248 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13800, loss[loss=0.1185, beats_loss=0.007649, ecapa_loss=0.0001741, whisper_loss=0.1091, over 18604.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001402, whisper_loss=0.09079, over 3805466.05 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:29:29,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2024-08-20 08:29:37,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4731800.0, ans=0.125 2024-08-20 08:30:10,560 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 08:30:12,740 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-08-20 08:30:26,341 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 08:30:31,851 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4732100.0, ans=0.2 2024-08-20 08:30:35,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4732200.0, ans=0.125 2024-08-20 08:30:48,111 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 08:30:50,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-20 08:30:51,737 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-20 08:30:54,832 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13850, loss[loss=0.09012, beats_loss=0.01065, ecapa_loss=0.0001678, whisper_loss=0.07779, over 13316.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.09028, over 3809730.88 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:30:54,960 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 08:30:56,749 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 34 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 08:31:15,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.631e+01 2.257e+01 2.391e+01 2.623e+01 3.979e+01, threshold=4.782e+01, percent-clipped=0.0 2024-08-20 08:31:16,100 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 08:31:52,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4732600.0, ans=0.0 2024-08-20 08:31:58,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4732600.0, ans=0.125 2024-08-20 08:32:07,707 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 08:32:18,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4732700.0, ans=0.0 2024-08-20 08:32:23,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4732800.0, ans=0.2 2024-08-20 08:32:26,081 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13900, loss[loss=0.1142, beats_loss=0.01116, ecapa_loss=0.0001342, whisper_loss=0.1017, over 22403.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.08991, over 3818122.29 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:32:37,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-20 08:32:53,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4732900.0, ans=0.125 2024-08-20 08:33:06,823 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 08:33:08,912 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 08:33:22,763 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 08:33:40,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4733200.0, ans=0.125 2024-08-20 08:33:54,070 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 08:33:55,665 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 13950, loss[loss=0.1173, beats_loss=0.009758, ecapa_loss=0.0001306, whisper_loss=0.1062, over 22077.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.08978, over 3818146.94 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:34:05,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4733300.0, ans=0.125 2024-08-20 08:34:17,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.254e+01 2.490e+01 2.803e+01 3.566e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 08:34:18,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-20 08:34:24,218 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 25 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-20 08:34:40,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-08-20 08:34:42,369 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 08:34:52,913 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 08:35:12,775 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 08:35:23,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4733700.0, ans=0.0 2024-08-20 08:35:24,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2024-08-20 08:35:25,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4733800.0, ans=0.1 2024-08-20 08:35:26,932 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14000, loss[loss=0.09537, beats_loss=0.01031, ecapa_loss=0.0001645, whisper_loss=0.08342, over 18823.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001412, whisper_loss=0.09015, over 3834224.97 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:35:35,095 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-20 08:35:36,123 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:35:46,828 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-20 08:35:50,956 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 08:35:59,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4733900.0, ans=0.125 2024-08-20 08:36:03,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2024-08-20 08:36:11,803 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 08:36:24,436 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 08:36:46,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4734200.0, ans=0.0 2024-08-20 08:37:00,704 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14050, loss[loss=0.1166, beats_loss=0.007734, ecapa_loss=0.0001727, whisper_loss=0.1071, over 19810.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.08949, over 3845902.81 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:37:01,714 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 30 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 08:37:02,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4734300.0, ans=0.125 2024-08-20 08:37:03,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4734300.0, ans=0.125 2024-08-20 08:37:22,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-20 08:37:22,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.283e+01 2.494e+01 2.922e+01 5.594e+01, threshold=4.987e+01, percent-clipped=2.0 2024-08-20 08:37:37,710 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 30 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 08:37:39,782 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 08:37:53,109 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 08:38:08,904 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 08:38:15,898 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 08:38:27,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4734700.0, ans=0.0 2024-08-20 08:38:31,622 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14100, loss[loss=0.1092, beats_loss=0.009006, ecapa_loss=0.000159, whisper_loss=0.0986, over 21891.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.08912, over 3849232.95 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:38:49,377 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 08:38:51,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4734900.0, ans=0.125 2024-08-20 08:39:00,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4734900.0, ans=0.2 2024-08-20 08:39:09,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4735000.0, ans=0.0 2024-08-20 08:39:16,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4735000.0, ans=0.0 2024-08-20 08:39:57,715 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=22.5 2024-08-20 08:40:01,883 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14150, loss[loss=0.1129, beats_loss=0.007629, ecapa_loss=0.0001413, whisper_loss=0.1038, over 17975.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001396, whisper_loss=0.08888, over 3824962.38 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:40:13,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4735300.0, ans=0.125 2024-08-20 08:40:15,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4735300.0, ans=0.1 2024-08-20 08:40:22,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.235e+01 2.465e+01 2.825e+01 7.434e+01, threshold=4.929e+01, percent-clipped=1.0 2024-08-20 08:40:41,944 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4735500.0, ans=0.125 2024-08-20 08:40:47,384 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4735500.0, ans=0.0 2024-08-20 08:40:54,639 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-08-20 08:41:10,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4735600.0, ans=0.09899494936611666 2024-08-20 08:41:26,215 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2024-08-20 08:41:30,423 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14200, loss[loss=0.09885, beats_loss=0.01216, ecapa_loss=0.000149, whisper_loss=0.0852, over 21873.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001401, whisper_loss=0.08925, over 3836260.06 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:41:41,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4735800.0, ans=0.0 2024-08-20 08:42:09,679 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 08:42:12,608 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 08:42:27,415 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 08:42:28,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4736100.0, ans=0.2 2024-08-20 08:42:31,767 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.932e-02 2024-08-20 08:42:31,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4736100.0, ans=0.1 2024-08-20 08:42:37,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4736100.0, ans=0.125 2024-08-20 08:43:02,450 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14250, loss[loss=0.07532, beats_loss=0.01105, ecapa_loss=0.0001453, whisper_loss=0.06281, over 15789.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.08963, over 3843677.51 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:43:08,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4736300.0, ans=0.0 2024-08-20 08:43:24,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.313e+01 2.520e+01 2.754e+01 4.470e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-20 08:43:30,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4736400.0, ans=0.1 2024-08-20 08:44:04,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4736600.0, ans=0.0 2024-08-20 08:44:13,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4736600.0, ans=0.0 2024-08-20 08:44:35,088 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14300, loss[loss=0.1067, beats_loss=0.01155, ecapa_loss=0.0001227, whisper_loss=0.09395, over 21355.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08975, over 3824002.41 frames. ], batch size: 82, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:44:52,836 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 08:44:57,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4736900.0, ans=0.1 2024-08-20 08:44:57,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4736900.0, ans=0.2 2024-08-20 08:45:22,681 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 08:45:26,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4737000.0, ans=0.2 2024-08-20 08:45:26,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2024-08-20 08:45:36,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4737100.0, ans=0.125 2024-08-20 08:45:37,945 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 08:45:52,100 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 08:46:02,216 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 19 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-20 08:46:05,495 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14350, loss[loss=0.09578, beats_loss=0.01059, ecapa_loss=0.0001567, whisper_loss=0.08362, over 21673.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01046, ecapa_loss=0.0001391, whisper_loss=0.08899, over 3801987.88 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:46:14,801 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 08:46:26,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.359e+01 2.648e+01 3.006e+01 2.772e+02, threshold=5.296e+01, percent-clipped=2.0 2024-08-20 08:46:42,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.00 vs. limit=10.0 2024-08-20 08:46:45,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4737500.0, ans=0.0 2024-08-20 08:47:01,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4737600.0, ans=0.1 2024-08-20 08:47:26,975 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-20 08:47:27,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4737700.0, ans=0.125 2024-08-20 08:47:32,699 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14400, loss[loss=0.1056, beats_loss=0.01017, ecapa_loss=0.0001036, whisper_loss=0.09437, over 13609.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.08956, over 3781194.09 frames. ], batch size: 50, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:47:36,253 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 08:48:02,691 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=22.5 2024-08-20 08:48:26,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=12.0 2024-08-20 08:48:42,439 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 14 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 08:49:02,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4738300.0, ans=0.0 2024-08-20 08:49:03,626 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14450, loss[loss=0.1035, beats_loss=0.01159, ecapa_loss=0.0001207, whisper_loss=0.09075, over 22796.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001388, whisper_loss=0.08924, over 3784891.90 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:49:06,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-08-20 08:49:08,548 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 08:49:21,487 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 16 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 08:49:22,852 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 39 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 08:49:23,004 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4738400.0, ans=0.125 2024-08-20 08:49:24,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.293e+01 2.479e+01 2.732e+01 7.579e+01, threshold=4.957e+01, percent-clipped=1.0 2024-08-20 08:49:29,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4738400.0, ans=0.125 2024-08-20 08:49:46,648 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 08:50:06,883 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 08:50:08,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4738600.0, ans=0.0 2024-08-20 08:50:28,863 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-20 08:50:37,006 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14500, loss[loss=0.1188, beats_loss=0.006174, ecapa_loss=0.000128, whisper_loss=0.1114, over 14776.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.08953, over 3762927.00 frames. ], batch size: 52, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:50:47,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4738800.0, ans=0.0 2024-08-20 08:50:48,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2024-08-20 08:50:49,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4738800.0, ans=0.125 2024-08-20 08:50:53,056 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 08:51:03,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4738900.0, ans=0.125 2024-08-20 08:51:10,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4738900.0, ans=0.125 2024-08-20 08:51:26,752 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 08:51:32,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4739000.0, ans=0.1 2024-08-20 08:51:57,715 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2024-08-20 08:52:01,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4739200.0, ans=0.125 2024-08-20 08:52:03,411 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4739200.0, ans=0.0 2024-08-20 08:52:11,787 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14550, loss[loss=0.07969, beats_loss=0.008026, ecapa_loss=0.0001661, whisper_loss=0.07001, over 14305.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01031, ecapa_loss=0.0001393, whisper_loss=0.08945, over 3760557.22 frames. ], batch size: 59, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:52:23,675 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 08:52:33,082 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-20 08:52:34,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.256e+01 2.477e+01 2.723e+01 4.705e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 08:52:39,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4739400.0, ans=0.125 2024-08-20 08:53:06,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-08-20 08:53:23,616 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 08:53:36,533 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 12 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 08:53:38,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4739700.0, ans=0.125 2024-08-20 08:53:44,327 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14600, loss[loss=0.1055, beats_loss=0.01201, ecapa_loss=0.0001144, whisper_loss=0.09232, over 17265.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01026, ecapa_loss=0.0001402, whisper_loss=0.08967, over 3730834.44 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:54:18,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4739900.0, ans=0.125 2024-08-20 08:54:22,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4740000.0, ans=0.05 2024-08-20 08:54:43,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4740100.0, ans=0.125 2024-08-20 08:54:43,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4740100.0, ans=0.125 2024-08-20 08:54:43,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-20 08:55:05,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4740200.0, ans=0.125 2024-08-20 08:55:16,415 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14650, loss[loss=0.08664, beats_loss=0.01055, ecapa_loss=0.0001464, whisper_loss=0.07462, over 22647.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01034, ecapa_loss=0.0001402, whisper_loss=0.08907, over 3781339.80 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:55:33,053 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 08:55:36,480 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 08:55:38,305 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.329e+01 2.529e+01 2.848e+01 4.887e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 08:55:38,750 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 32 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 08:55:50,847 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.46 vs. limit=22.5 2024-08-20 08:56:14,048 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 34 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 08:56:21,985 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-08-20 08:56:36,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4740700.0, ans=0.125 2024-08-20 08:56:45,541 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14700, loss[loss=0.1092, beats_loss=0.007128, ecapa_loss=0.0001818, whisper_loss=0.1003, over 15647.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01027, ecapa_loss=0.0001417, whisper_loss=0.0902, over 3784106.49 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:57:07,259 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 08:57:35,137 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 08:58:15,522 INFO [train_multi_KD3.py:1117] (1/4) Epoch 32, batch 14750, loss[loss=0.1013, beats_loss=0.01362, ecapa_loss=0.0001181, whisper_loss=0.08652, over 22807.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001426, whisper_loss=0.08969, over 3792004.32 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:58:26,318 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-20 08:58:28,572 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-20 08:58:36,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.385e+01 2.604e+01 3.059e+01 5.323e+01, threshold=5.208e+01, percent-clipped=1.0 2024-08-20 08:58:45,621 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 08:59:00,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-20 08:59:06,292 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 08:59:06,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4741600.0, ans=0.2 2024-08-20 08:59:17,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4741600.0, ans=0.125 2024-08-20 08:59:20,390 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 08:59:31,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4741700.0, ans=10.0 2024-08-20 08:59:32,444 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-08-20 08:59:33,868 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 09:00:13,078 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 0, loss[loss=0.09958, beats_loss=0.01019, ecapa_loss=0.000145, whisper_loss=0.08794, over 17367.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.01019, ecapa_loss=0.000145, whisper_loss=0.08794, over 17367.00 frames. ], batch size: 67, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:00:13,079 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 09:00:48,216 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005003, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 09:01:09,186 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on SV_voxceleb1: loss=0.003963, beats_loss=0, ecapa_loss=0.0003963, whisper_loss=0, over 944235.00 frames. 2024-08-20 09:02:51,240 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 09:02:51,243 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 09:02:52,897 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 27 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-20 09:03:56,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4741980.0, ans=0.2 2024-08-20 09:03:56,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4741980.0, ans=0.0 2024-08-20 09:04:14,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4742080.0, ans=0.2 2024-08-20 09:04:19,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4742080.0, ans=0.125 2024-08-20 09:04:32,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4742180.0, ans=0.07 2024-08-20 09:04:37,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4742180.0, ans=0.0 2024-08-20 09:04:55,996 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 09:04:57,994 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 50, loss[loss=0.1126, beats_loss=0.005864, ecapa_loss=0.0001468, whisper_loss=0.1053, over 20376.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009579, ecapa_loss=0.0001388, whisper_loss=0.09043, over 876724.25 frames. ], batch size: 77, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:05:00,386 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 09:05:26,156 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2024-08-20 09:05:31,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.495e+01 2.772e+01 3.142e+01 4.372e+01, threshold=5.543e+01, percent-clipped=0.0 2024-08-20 09:05:37,973 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.027e-01 2024-08-20 09:05:54,860 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 09:05:59,200 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 24 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-20 09:06:34,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4742680.0, ans=0.1 2024-08-20 09:06:45,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4742680.0, ans=0.125 2024-08-20 09:06:53,975 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 100, loss[loss=0.09509, beats_loss=0.01141, ecapa_loss=0.0001227, whisper_loss=0.08245, over 22728.00 frames. ], tot_loss[loss=0.09926, beats_loss=0.009256, ecapa_loss=0.0001402, whisper_loss=0.0886, over 1507610.71 frames. ], batch size: 89, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:07:08,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4742780.0, ans=0.125 2024-08-20 09:07:09,037 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 09:07:21,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=12.0 2024-08-20 09:07:42,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4742980.0, ans=0.1 2024-08-20 09:07:59,032 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 09:08:25,388 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 09:08:36,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4743180.0, ans=0.125 2024-08-20 09:08:43,848 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 150, loss[loss=0.1003, beats_loss=0.01205, ecapa_loss=0.0001147, whisper_loss=0.08709, over 21917.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009103, ecapa_loss=0.0001421, whisper_loss=0.09034, over 1994439.74 frames. ], batch size: 83, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:09:01,991 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 09:09:04,329 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 09:09:04,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4743380.0, ans=0.125 2024-08-20 09:09:06,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4743380.0, ans=0.2 2024-08-20 09:09:11,140 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.463e+01 2.692e+01 3.124e+01 4.669e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-20 09:09:40,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-20 09:09:51,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4743580.0, ans=0.5 2024-08-20 09:09:57,947 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-20 09:10:03,491 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 09:10:04,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4743680.0, ans=0.125 2024-08-20 09:10:17,688 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 200, loss[loss=0.09816, beats_loss=0.009338, ecapa_loss=0.0001169, whisper_loss=0.08766, over 16010.00 frames. ], tot_loss[loss=0.09986, beats_loss=0.009508, ecapa_loss=0.0001413, whisper_loss=0.08894, over 2364208.76 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:10:35,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-08-20 09:10:41,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4743880.0, ans=0.0 2024-08-20 09:10:48,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4743880.0, ans=0.125 2024-08-20 09:11:01,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4743980.0, ans=0.2 2024-08-20 09:11:05,630 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 09:11:08,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4743980.0, ans=0.125 2024-08-20 09:11:10,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4744080.0, ans=0.125 2024-08-20 09:11:45,346 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 250, loss[loss=0.08832, beats_loss=0.01037, ecapa_loss=0.000139, whisper_loss=0.07655, over 13953.00 frames. ], tot_loss[loss=0.09952, beats_loss=0.009859, ecapa_loss=0.0001415, whisper_loss=0.08825, over 2672482.68 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:11:53,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4744280.0, ans=0.0 2024-08-20 09:11:54,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4744280.0, ans=0.125 2024-08-20 09:12:09,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.364e+01 2.600e+01 2.936e+01 1.943e+02, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 09:12:46,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4744580.0, ans=0.0 2024-08-20 09:13:02,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4744680.0, ans=0.125 2024-08-20 09:13:09,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4744680.0, ans=0.1 2024-08-20 09:13:13,944 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 300, loss[loss=0.08903, beats_loss=0.01249, ecapa_loss=0.0001277, whisper_loss=0.07526, over 22453.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009899, ecapa_loss=0.0001413, whisper_loss=0.08956, over 2900761.15 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:13:32,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4744880.0, ans=0.05 2024-08-20 09:14:30,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4745180.0, ans=0.125 2024-08-20 09:14:32,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4745180.0, ans=0.025 2024-08-20 09:14:43,501 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 350, loss[loss=0.09481, beats_loss=0.008587, ecapa_loss=0.0001821, whisper_loss=0.0844, over 17971.00 frames. ], tot_loss[loss=0.09969, beats_loss=0.009979, ecapa_loss=0.0001417, whisper_loss=0.08829, over 3039611.24 frames. ], batch size: 73, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:14:58,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-20 09:15:08,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.237e+01 2.517e+01 2.824e+01 3.334e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 09:15:11,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4745380.0, ans=0.1 2024-08-20 09:15:25,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4745480.0, ans=0.1 2024-08-20 09:15:49,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4745580.0, ans=0.1 2024-08-20 09:15:50,873 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 09:16:03,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4745680.0, ans=0.0 2024-08-20 09:16:09,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4745680.0, ans=0.2 2024-08-20 09:16:14,198 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 09:16:15,547 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 400, loss[loss=0.1109, beats_loss=0.009392, ecapa_loss=0.00014, whisper_loss=0.1001, over 18326.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01013, ecapa_loss=0.0001411, whisper_loss=0.08835, over 3212024.51 frames. ], batch size: 70, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:16:16,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4745780.0, ans=0.0 2024-08-20 09:16:19,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4745780.0, ans=0.125 2024-08-20 09:16:19,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=4745780.0, ans=0.1 2024-08-20 09:16:29,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4745780.0, ans=0.125 2024-08-20 09:16:46,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2024-08-20 09:16:47,866 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2024-08-20 09:16:49,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4745880.0, ans=0.1 2024-08-20 09:16:54,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4745980.0, ans=0.125 2024-08-20 09:17:02,419 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 09:17:07,893 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 09:17:27,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4746080.0, ans=0.125 2024-08-20 09:17:28,603 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-20 09:17:31,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4746180.0, ans=0.125 2024-08-20 09:17:47,693 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 450, loss[loss=0.107, beats_loss=0.008781, ecapa_loss=0.0001572, whisper_loss=0.09665, over 17775.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01018, ecapa_loss=0.0001416, whisper_loss=0.08846, over 3325938.12 frames. ], batch size: 71, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:18:09,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=12.0 2024-08-20 09:18:12,016 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.270e+01 2.468e+01 2.712e+01 4.275e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-20 09:18:17,635 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 09:18:19,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4746380.0, ans=0.125 2024-08-20 09:18:22,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4746480.0, ans=0.125 2024-08-20 09:18:22,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4746480.0, ans=0.125 2024-08-20 09:18:43,862 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:18:45,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4746580.0, ans=0.0 2024-08-20 09:18:51,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-08-20 09:19:12,168 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 09:19:18,907 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 500, loss[loss=0.1026, beats_loss=0.009785, ecapa_loss=0.0001469, whisper_loss=0.09134, over 16864.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0102, ecapa_loss=0.0001408, whisper_loss=0.08868, over 3441494.54 frames. ], batch size: 70, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:19:32,393 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 09:19:47,321 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 09:20:19,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-20 09:20:45,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4747180.0, ans=0.0 2024-08-20 09:20:50,230 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 550, loss[loss=0.101, beats_loss=0.01199, ecapa_loss=0.0001432, whisper_loss=0.08761, over 21810.00 frames. ], tot_loss[loss=0.09982, beats_loss=0.01028, ecapa_loss=0.0001387, whisper_loss=0.08816, over 3501833.02 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:20:52,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4747280.0, ans=0.1 2024-08-20 09:20:57,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-20 09:20:58,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.15 vs. limit=6.0 2024-08-20 09:21:06,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4747380.0, ans=0.2 2024-08-20 09:21:06,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2024-08-20 09:21:08,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4747380.0, ans=0.0 2024-08-20 09:21:14,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.517e+01 2.843e+01 4.116e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-20 09:21:44,003 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 09:22:11,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4747680.0, ans=0.125 2024-08-20 09:22:20,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4747780.0, ans=0.1 2024-08-20 09:22:22,667 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 600, loss[loss=0.0715, beats_loss=0.01329, ecapa_loss=0.0001365, whisper_loss=0.05684, over 20604.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01023, ecapa_loss=0.0001394, whisper_loss=0.08903, over 3564718.90 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:22:30,747 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=12.0 2024-08-20 09:22:36,992 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2024-08-20 09:23:14,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4747980.0, ans=0.1 2024-08-20 09:23:19,978 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 09:23:22,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4748080.0, ans=0.0 2024-08-20 09:23:25,578 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 09:23:52,932 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 650, loss[loss=0.1145, beats_loss=0.008092, ecapa_loss=0.0001538, whisper_loss=0.1049, over 23438.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01026, ecapa_loss=0.0001388, whisper_loss=0.0892, over 3615695.15 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:24:14,434 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 09:24:16,207 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 31 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 09:24:16,571 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-20 09:24:17,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.327e+01 2.614e+01 2.843e+01 3.937e+01, threshold=5.228e+01, percent-clipped=0.0 2024-08-20 09:24:32,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4748480.0, ans=0.125 2024-08-20 09:24:36,253 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 09:24:41,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4748480.0, ans=0.0 2024-08-20 09:24:50,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4748580.0, ans=0.1 2024-08-20 09:24:50,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4748580.0, ans=0.125 2024-08-20 09:25:04,022 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4748680.0, ans=0.1 2024-08-20 09:25:04,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4748680.0, ans=0.025 2024-08-20 09:25:05,865 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 36 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 09:25:21,277 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 700, loss[loss=0.1199, beats_loss=0.008218, ecapa_loss=0.0001537, whisper_loss=0.1101, over 18009.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01025, ecapa_loss=0.0001394, whisper_loss=0.08943, over 3648720.86 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:25:22,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4748780.0, ans=0.125 2024-08-20 09:25:33,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4748780.0, ans=0.125 2024-08-20 09:25:55,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4748980.0, ans=0.2 2024-08-20 09:26:08,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-20 09:26:11,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4748980.0, ans=0.0 2024-08-20 09:26:14,641 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-08-20 09:26:28,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4749080.0, ans=0.0 2024-08-20 09:26:40,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4749180.0, ans=0.125 2024-08-20 09:26:48,804 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 750, loss[loss=0.09251, beats_loss=0.01035, ecapa_loss=0.0001111, whisper_loss=0.08105, over 20252.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0102, ecapa_loss=0.0001386, whisper_loss=0.0899, over 3663548.08 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:27:05,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4749380.0, ans=0.1 2024-08-20 09:27:13,062 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.302e+01 2.530e+01 2.816e+01 3.828e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 09:27:24,429 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 09:27:31,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4749480.0, ans=0.0 2024-08-20 09:27:36,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4749480.0, ans=0.1 2024-08-20 09:28:05,071 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 09:28:18,119 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 800, loss[loss=0.1012, beats_loss=0.01029, ecapa_loss=0.0001149, whisper_loss=0.08974, over 16083.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01026, ecapa_loss=0.0001387, whisper_loss=0.08925, over 3676712.30 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:28:31,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-20 09:28:36,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4749880.0, ans=0.0 2024-08-20 09:28:38,820 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 09:28:39,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4749880.0, ans=0.0 2024-08-20 09:28:41,826 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 09:28:43,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4749880.0, ans=0.0 2024-08-20 09:28:59,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4749980.0, ans=0.125 2024-08-20 09:29:18,924 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 09:29:46,188 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 850, loss[loss=0.08298, beats_loss=0.007918, ecapa_loss=0.0001403, whisper_loss=0.07366, over 14264.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01024, ecapa_loss=0.0001381, whisper_loss=0.08872, over 3682266.08 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:29:48,563 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 09:29:59,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4750280.0, ans=0.1 2024-08-20 09:30:11,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.195e+01 2.440e+01 2.729e+01 3.750e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 09:30:11,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4750380.0, ans=0.0 2024-08-20 09:30:22,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4750480.0, ans=0.1 2024-08-20 09:30:43,368 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 09:30:52,757 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2024-08-20 09:31:02,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4750680.0, ans=0.125 2024-08-20 09:31:15,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-20 09:31:15,783 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 900, loss[loss=0.07946, beats_loss=0.01148, ecapa_loss=0.0001108, whisper_loss=0.06687, over 15729.00 frames. ], tot_loss[loss=0.0997, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.08803, over 3689032.86 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:31:18,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4750780.0, ans=0.125 2024-08-20 09:31:23,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4750780.0, ans=0.1 2024-08-20 09:31:28,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4750780.0, ans=0.125 2024-08-20 09:31:32,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4750880.0, ans=0.125 2024-08-20 09:31:32,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4750880.0, ans=0.125 2024-08-20 09:31:46,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4750880.0, ans=0.2 2024-08-20 09:32:00,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-20 09:32:07,249 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 21 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-20 09:32:35,547 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 09:32:43,812 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 950, loss[loss=0.1008, beats_loss=0.008006, ecapa_loss=0.0001262, whisper_loss=0.0915, over 18696.00 frames. ], tot_loss[loss=0.0993, beats_loss=0.01032, ecapa_loss=0.0001381, whisper_loss=0.0876, over 3676781.45 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:32:59,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4751280.0, ans=0.125 2024-08-20 09:33:08,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.210e+01 2.427e+01 2.730e+01 1.118e+02, threshold=4.854e+01, percent-clipped=2.0 2024-08-20 09:33:14,424 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 09:33:23,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4751480.0, ans=0.0 2024-08-20 09:33:32,140 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 14 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 09:33:36,132 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 09:33:43,759 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 09:34:10,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4751680.0, ans=0.1 2024-08-20 09:34:11,798 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 09:34:12,747 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1000, loss[loss=0.09327, beats_loss=0.01055, ecapa_loss=0.0001393, whisper_loss=0.08133, over 21652.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.01035, ecapa_loss=0.0001375, whisper_loss=0.08785, over 3698042.89 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:34:21,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4751780.0, ans=0.0 2024-08-20 09:34:50,422 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 09:34:55,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4751980.0, ans=0.125 2024-08-20 09:35:05,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4751980.0, ans=0.0 2024-08-20 09:35:37,478 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 09:35:39,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4752180.0, ans=0.125 2024-08-20 09:35:42,296 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1050, loss[loss=0.09696, beats_loss=0.01068, ecapa_loss=0.0001287, whisper_loss=0.08499, over 19839.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01034, ecapa_loss=0.0001374, whisper_loss=0.08847, over 3719534.46 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:36:08,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.242e+01 2.597e+01 2.833e+01 4.409e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-20 09:36:12,269 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 09:36:19,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4752480.0, ans=0.125 2024-08-20 09:36:26,080 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-20 09:36:31,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-20 09:36:43,034 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 09:36:44,901 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 28 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 09:36:56,865 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 15 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 09:37:02,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4752680.0, ans=0.125 2024-08-20 09:37:04,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4752680.0, ans=0.1 2024-08-20 09:37:05,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4752680.0, ans=0.0 2024-08-20 09:37:05,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2024-08-20 09:37:12,225 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1100, loss[loss=0.1159, beats_loss=0.009843, ecapa_loss=0.0001353, whisper_loss=0.1047, over 19871.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01032, ecapa_loss=0.000138, whisper_loss=0.08826, over 3718909.01 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:37:35,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4752880.0, ans=0.1 2024-08-20 09:37:40,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4752880.0, ans=0.1 2024-08-20 09:37:44,706 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 09:38:04,290 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 09:38:18,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4753080.0, ans=0.1 2024-08-20 09:38:24,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-20 09:38:25,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4753180.0, ans=0.125 2024-08-20 09:38:29,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4753180.0, ans=0.0 2024-08-20 09:38:42,096 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1150, loss[loss=0.1118, beats_loss=0.009496, ecapa_loss=0.0001377, whisper_loss=0.1009, over 17683.00 frames. ], tot_loss[loss=0.09968, beats_loss=0.01032, ecapa_loss=0.0001369, whisper_loss=0.08799, over 3688692.16 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:38:50,373 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4753280.0, ans=0.0 2024-08-20 09:38:51,444 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 09:39:06,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.346e+01 2.635e+01 2.990e+01 2.498e+02, threshold=5.271e+01, percent-clipped=4.0 2024-08-20 09:39:10,978 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-20 09:39:20,714 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 09:39:39,090 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 09:40:06,352 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 24 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 09:40:11,332 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1200, loss[loss=0.1081, beats_loss=0.008623, ecapa_loss=0.000172, whisper_loss=0.0978, over 16190.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.0001366, whisper_loss=0.08886, over 3693895.62 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:40:15,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2024-08-20 09:40:22,342 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 09:40:56,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4753980.0, ans=0.1 2024-08-20 09:41:03,734 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=22.5 2024-08-20 09:41:38,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4754280.0, ans=0.125 2024-08-20 09:41:38,484 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:41:39,237 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1250, loss[loss=0.1049, beats_loss=0.007911, ecapa_loss=0.0001537, whisper_loss=0.09549, over 18694.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01036, ecapa_loss=0.0001365, whisper_loss=0.08912, over 3733577.56 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:41:43,215 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.309e+01 2024-08-20 09:41:51,735 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 09:41:52,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4754280.0, ans=0.2 2024-08-20 09:42:05,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.750e+01 2.979e+01 6.876e+01, threshold=5.500e+01, percent-clipped=2.0 2024-08-20 09:42:11,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4754380.0, ans=0.0 2024-08-20 09:42:14,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4754480.0, ans=0.0 2024-08-20 09:42:37,293 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 09:42:37,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4754580.0, ans=0.05 2024-08-20 09:42:54,162 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 41 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 09:43:07,363 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1300, loss[loss=0.09163, beats_loss=0.007797, ecapa_loss=0.0001769, whisper_loss=0.08207, over 15778.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001379, whisper_loss=0.08946, over 3736934.28 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:43:10,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.71 vs. limit=10.0 2024-08-20 09:43:27,298 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 09:43:29,109 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 09:43:42,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4754980.0, ans=0.0 2024-08-20 09:43:49,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4754980.0, ans=0.125 2024-08-20 09:43:53,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4754980.0, ans=0.05 2024-08-20 09:43:53,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4754980.0, ans=0.125 2024-08-20 09:44:04,135 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 09:44:14,619 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-20 09:44:24,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4755180.0, ans=0.2 2024-08-20 09:44:27,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4755180.0, ans=0.125 2024-08-20 09:44:27,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4755180.0, ans=0.125 2024-08-20 09:44:37,940 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1350, loss[loss=0.1003, beats_loss=0.01144, ecapa_loss=9.598e-05, whisper_loss=0.08794, over 19914.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01044, ecapa_loss=0.0001378, whisper_loss=0.08861, over 3725111.92 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:44:45,509 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 09:45:01,428 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 14 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 09:45:04,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.108e+01 2.418e+01 2.623e+01 3.290e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-20 09:45:15,610 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 09:45:22,837 WARNING [optim.py:496] (1/4) Scaling gradients by 0.032859351485967636, model_norm_threshold=48.36314392089844 2024-08-20 09:45:22,996 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.250e+05, grad_sumsq=3.697e+04, orig_rms_sq=8.792e+00 2024-08-20 09:45:36,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4755580.0, ans=0.09899494936611666 2024-08-20 09:45:37,220 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 23 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-20 09:45:53,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4755680.0, ans=0.1 2024-08-20 09:46:08,151 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1400, loss[loss=0.06662, beats_loss=0.01182, ecapa_loss=0.0001235, whisper_loss=0.05356, over 18686.00 frames. ], tot_loss[loss=0.09959, beats_loss=0.01046, ecapa_loss=0.0001369, whisper_loss=0.08776, over 3722503.64 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:46:10,216 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 09:46:26,092 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 09:46:36,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4755880.0, ans=0.125 2024-08-20 09:46:53,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4755980.0, ans=0.1 2024-08-20 09:47:07,775 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 09:47:09,459 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 09:47:10,977 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 09:47:15,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4756080.0, ans=0.0 2024-08-20 09:47:18,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4756180.0, ans=0.1 2024-08-20 09:47:19,945 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 09:47:29,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4756180.0, ans=0.125 2024-08-20 09:47:35,270 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1450, loss[loss=0.1085, beats_loss=0.009199, ecapa_loss=0.0001438, whisper_loss=0.09784, over 18744.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.01037, ecapa_loss=0.000138, whisper_loss=0.08786, over 3735150.07 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:47:49,422 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 09:48:01,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.208e+01 2.529e+01 2.783e+01 1.472e+03, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 09:48:03,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4756380.0, ans=0.2 2024-08-20 09:48:12,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4756480.0, ans=0.125 2024-08-20 09:48:17,501 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 09:48:18,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-20 09:48:21,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4756480.0, ans=0.125 2024-08-20 09:48:58,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4756580.0, ans=0.0 2024-08-20 09:49:00,326 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 14 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 09:49:07,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4756580.0, ans=0.2 2024-08-20 09:49:30,533 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1500, loss[loss=0.1014, beats_loss=0.01175, ecapa_loss=0.000112, whisper_loss=0.08852, over 22106.00 frames. ], tot_loss[loss=0.09911, beats_loss=0.01047, ecapa_loss=0.000137, whisper_loss=0.08727, over 3735157.26 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:49:33,343 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 09:49:51,508 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 10 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 09:50:21,176 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 09:50:26,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4757080.0, ans=0.125 2024-08-20 09:50:32,722 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-20 09:50:36,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4757080.0, ans=0.035 2024-08-20 09:50:36,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4757080.0, ans=0.1 2024-08-20 09:50:38,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4757080.0, ans=0.0 2024-08-20 09:50:47,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4757180.0, ans=0.125 2024-08-20 09:50:53,537 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-08-20 09:51:03,014 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1550, loss[loss=0.07591, beats_loss=0.009323, ecapa_loss=0.0001525, whisper_loss=0.06506, over 16864.00 frames. ], tot_loss[loss=0.0992, beats_loss=0.01043, ecapa_loss=0.0001371, whisper_loss=0.0874, over 3769118.74 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:51:07,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4757280.0, ans=0.125 2024-08-20 09:51:30,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.231e+01 2.477e+01 2.793e+01 4.044e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 09:51:52,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4757480.0, ans=0.125 2024-08-20 09:51:57,490 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:52:14,235 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 09:52:16,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4757680.0, ans=0.0 2024-08-20 09:52:17,822 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 09:52:18,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4757680.0, ans=0.2 2024-08-20 09:52:35,564 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1600, loss[loss=0.1039, beats_loss=0.009299, ecapa_loss=0.0001054, whisper_loss=0.09354, over 25118.00 frames. ], tot_loss[loss=0.09962, beats_loss=0.01034, ecapa_loss=0.0001365, whisper_loss=0.08792, over 3730752.11 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:52:54,364 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 28 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 09:53:06,718 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 24 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-20 09:53:08,454 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-20 09:53:10,315 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 09:53:21,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4757980.0, ans=0.125 2024-08-20 09:53:38,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4758080.0, ans=0.0 2024-08-20 09:53:48,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4758180.0, ans=0.0 2024-08-20 09:53:52,747 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 09:53:54,645 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 09:54:06,364 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1650, loss[loss=0.09041, beats_loss=0.01162, ecapa_loss=0.0001405, whisper_loss=0.07738, over 17180.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01031, ecapa_loss=0.0001368, whisper_loss=0.0882, over 3775345.59 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:54:18,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4758280.0, ans=0.0 2024-08-20 09:54:32,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.203e+01 2.449e+01 2.785e+01 3.857e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 09:54:53,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4758480.0, ans=0.125 2024-08-20 09:54:56,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4758480.0, ans=0.125 2024-08-20 09:55:04,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4758580.0, ans=0.1 2024-08-20 09:55:11,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4758580.0, ans=0.2 2024-08-20 09:55:17,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4758680.0, ans=0.0 2024-08-20 09:55:35,277 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1700, loss[loss=0.122, beats_loss=0.008193, ecapa_loss=0.0001396, whisper_loss=0.1124, over 20930.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0103, ecapa_loss=0.0001368, whisper_loss=0.08925, over 3795867.68 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:55:40,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4758780.0, ans=0.5 2024-08-20 09:55:56,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-20 09:55:58,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4758880.0, ans=0.125 2024-08-20 09:56:20,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4758980.0, ans=0.0 2024-08-20 09:56:21,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4758980.0, ans=0.125 2024-08-20 09:56:26,654 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 15 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-20 09:56:28,188 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 09:56:43,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4759080.0, ans=0.125 2024-08-20 09:56:53,209 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 16 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 09:56:54,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4759180.0, ans=0.0 2024-08-20 09:57:04,910 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 09:57:06,565 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1750, loss[loss=0.09187, beats_loss=0.01058, ecapa_loss=0.0001334, whisper_loss=0.07995, over 20790.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01025, ecapa_loss=0.0001375, whisper_loss=0.08945, over 3804401.34 frames. ], batch size: 84, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:57:08,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4759280.0, ans=0.0 2024-08-20 09:57:33,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.295e+01 2.510e+01 2.716e+01 9.441e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 09:57:51,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4759480.0, ans=0.2 2024-08-20 09:58:01,444 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 09:58:06,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4759580.0, ans=0.125 2024-08-20 09:58:09,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4759580.0, ans=0.0 2024-08-20 09:58:09,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4759580.0, ans=0.2 2024-08-20 09:58:21,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4759680.0, ans=0.2 2024-08-20 09:58:23,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4759680.0, ans=0.125 2024-08-20 09:58:23,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4759680.0, ans=0.0 2024-08-20 09:58:34,333 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1800, loss[loss=0.08961, beats_loss=0.01148, ecapa_loss=0.0001152, whisper_loss=0.07698, over 17133.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01032, ecapa_loss=0.0001362, whisper_loss=0.08912, over 3810227.28 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:58:34,562 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 21 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-20 09:58:53,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-20 09:59:14,945 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 09:59:25,621 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4760080.0, ans=0.0 2024-08-20 09:59:41,175 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 09:59:53,695 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 10:00:01,331 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1850, loss[loss=0.1104, beats_loss=0.007798, ecapa_loss=0.0001568, whisper_loss=0.101, over 15037.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01031, ecapa_loss=0.0001366, whisper_loss=0.08934, over 3804079.48 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:00:02,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-08-20 10:00:06,906 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 10:00:14,690 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-20 10:00:24,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4760380.0, ans=0.2 2024-08-20 10:00:27,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.285e+01 2.493e+01 2.881e+01 4.103e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-20 10:00:32,848 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 10:00:36,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4760480.0, ans=0.1 2024-08-20 10:00:46,180 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 10:00:58,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-08-20 10:01:02,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-08-20 10:01:09,664 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 10:01:14,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4760680.0, ans=0.0 2024-08-20 10:01:27,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4760780.0, ans=0.0 2024-08-20 10:01:28,636 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1900, loss[loss=0.1115, beats_loss=0.01069, ecapa_loss=9.609e-05, whisper_loss=0.09986, over 15481.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001364, whisper_loss=0.08933, over 3832844.25 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:01:36,733 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 10:01:42,638 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4760780.0, ans=0.2 2024-08-20 10:02:06,109 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 10:02:22,437 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 10:02:24,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4761080.0, ans=0.125 2024-08-20 10:02:34,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4761080.0, ans=0.125 2024-08-20 10:02:39,860 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4761180.0, ans=0.2 2024-08-20 10:02:43,582 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:02:49,994 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 10:02:51,505 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 10:02:54,995 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 1950, loss[loss=0.0951, beats_loss=0.01463, ecapa_loss=0.0001024, whisper_loss=0.07944, over 23584.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001354, whisper_loss=0.08919, over 3832111.89 frames. ], batch size: 95, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:03:02,354 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4761280.0, ans=0.125 2024-08-20 10:03:06,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4761280.0, ans=0.125 2024-08-20 10:03:19,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.238e+01 2.475e+01 2.855e+01 5.978e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-20 10:03:20,161 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 10:03:29,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4761480.0, ans=0.1 2024-08-20 10:03:44,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4761580.0, ans=0.0 2024-08-20 10:03:46,109 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4761580.0, ans=0.0 2024-08-20 10:03:57,907 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 10:04:00,181 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 10:04:10,831 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:04:20,882 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2000, loss[loss=0.1062, beats_loss=0.01059, ecapa_loss=0.0001334, whisper_loss=0.09423, over 21632.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.000136, whisper_loss=0.08903, over 3839487.56 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:04:23,974 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-20 10:04:34,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4761780.0, ans=0.0 2024-08-20 10:04:42,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4761880.0, ans=0.125 2024-08-20 10:05:05,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4761980.0, ans=0.125 2024-08-20 10:05:13,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4762080.0, ans=0.07 2024-08-20 10:05:13,485 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-20 10:05:42,455 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 10:05:44,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4762180.0, ans=0.1 2024-08-20 10:05:46,734 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2050, loss[loss=0.09353, beats_loss=0.01332, ecapa_loss=0.0001238, whisper_loss=0.07897, over 18098.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001349, whisper_loss=0.08972, over 3804103.48 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:06:13,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.228e+01 2.469e+01 2.687e+01 4.353e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 10:06:15,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-20 10:06:19,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4762380.0, ans=0.0 2024-08-20 10:06:29,195 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 10:06:35,849 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 10:06:50,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4762580.0, ans=0.125 2024-08-20 10:06:51,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4762580.0, ans=0.125 2024-08-20 10:07:04,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4762680.0, ans=0.1 2024-08-20 10:07:12,938 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2100, loss[loss=0.1058, beats_loss=0.009584, ecapa_loss=0.0001173, whisper_loss=0.09505, over 21129.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01035, ecapa_loss=0.000134, whisper_loss=0.08944, over 3771970.26 frames. ], batch size: 81, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:07:56,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4762980.0, ans=0.125 2024-08-20 10:08:06,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4763080.0, ans=0.0 2024-08-20 10:08:33,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4763180.0, ans=0.0 2024-08-20 10:08:38,805 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2150, loss[loss=0.1191, beats_loss=0.0117, ecapa_loss=0.0001072, whisper_loss=0.1063, over 19688.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001338, whisper_loss=0.08963, over 3756180.61 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:08:43,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4763280.0, ans=0.0 2024-08-20 10:08:57,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4763380.0, ans=0.0 2024-08-20 10:08:58,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4763380.0, ans=0.0 2024-08-20 10:09:05,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.500e+01 2.856e+01 5.859e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-20 10:09:07,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4763380.0, ans=0.125 2024-08-20 10:09:08,092 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-20 10:09:34,261 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 28 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 10:09:39,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4763580.0, ans=0.95 2024-08-20 10:10:02,140 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 10:10:05,302 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2200, loss[loss=0.1057, beats_loss=0.009395, ecapa_loss=0.0001393, whisper_loss=0.0949, over 21834.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001339, whisper_loss=0.08958, over 3759810.74 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:10:28,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-08-20 10:11:09,576 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 10:11:19,385 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 10:11:30,026 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2250, loss[loss=0.1103, beats_loss=0.01014, ecapa_loss=0.0001494, whisper_loss=0.09863, over 17807.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001337, whisper_loss=0.09014, over 3752872.91 frames. ], batch size: 72, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:11:36,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4764280.0, ans=10.0 2024-08-20 10:11:40,764 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 29 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-20 10:11:43,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4764280.0, ans=0.09899494936611666 2024-08-20 10:11:54,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4764380.0, ans=0.125 2024-08-20 10:11:55,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.206e+01 2.415e+01 2.754e+01 4.736e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-20 10:11:57,726 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 10:12:08,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4764480.0, ans=0.125 2024-08-20 10:12:18,438 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4764480.0, ans=0.125 2024-08-20 10:12:21,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4764580.0, ans=0.125 2024-08-20 10:12:25,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4764580.0, ans=0.0 2024-08-20 10:12:38,847 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 10:12:55,543 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2300, loss[loss=0.1077, beats_loss=0.009146, ecapa_loss=0.0001599, whisper_loss=0.09697, over 18966.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001341, whisper_loss=0.09066, over 3797356.79 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:13:25,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4764880.0, ans=0.125 2024-08-20 10:13:26,309 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 28 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 10:13:43,626 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 10:13:51,416 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 10:14:03,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4765180.0, ans=0.125 2024-08-20 10:14:06,598 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 10:14:10,124 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-20 10:14:15,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4765180.0, ans=0.125 2024-08-20 10:14:21,862 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2350, loss[loss=0.1201, beats_loss=0.00998, ecapa_loss=0.000131, whisper_loss=0.1088, over 19008.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001348, whisper_loss=0.0906, over 3799120.13 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:14:27,242 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 10:14:45,243 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 10:14:48,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.264e+01 2.559e+01 2.893e+01 5.027e+01, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 10:15:14,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4765580.0, ans=0.0 2024-08-20 10:15:16,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-08-20 10:15:29,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4765680.0, ans=0.0 2024-08-20 10:15:36,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4765680.0, ans=0.2 2024-08-20 10:15:37,804 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4765680.0, ans=0.0 2024-08-20 10:15:46,591 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2400, loss[loss=0.09792, beats_loss=0.01208, ecapa_loss=0.0001253, whisper_loss=0.08459, over 22899.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001361, whisper_loss=0.09111, over 3827550.61 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:15:47,869 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-20 10:16:00,861 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 12 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 10:16:11,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4765880.0, ans=0.0 2024-08-20 10:16:11,693 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.795e-01 2024-08-20 10:16:14,678 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4765880.0, ans=0.2 2024-08-20 10:16:14,809 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.763e+00 2024-08-20 10:16:29,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4765980.0, ans=0.0 2024-08-20 10:16:43,977 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-20 10:16:52,454 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2024-08-20 10:17:00,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4766180.0, ans=0.125 2024-08-20 10:17:07,154 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 10:17:11,565 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2450, loss[loss=0.1169, beats_loss=0.009895, ecapa_loss=0.0001571, whisper_loss=0.1054, over 22260.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001367, whisper_loss=0.09081, over 3820491.81 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:17:17,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4766280.0, ans=0.0 2024-08-20 10:17:38,559 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.311e+01 2.583e+01 2.758e+01 5.133e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-20 10:17:38,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4766380.0, ans=0.0 2024-08-20 10:17:49,283 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 24 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-20 10:17:56,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-20 10:18:04,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4766580.0, ans=0.07 2024-08-20 10:18:09,041 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 10:18:40,982 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2500, loss[loss=0.125, beats_loss=0.008188, ecapa_loss=0.0001477, whisper_loss=0.1153, over 19958.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01028, ecapa_loss=0.000137, whisper_loss=0.09125, over 3865452.39 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:18:41,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4766780.0, ans=0.0 2024-08-20 10:18:44,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4766780.0, ans=0.0 2024-08-20 10:19:16,164 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 10:19:18,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4766980.0, ans=0.0 2024-08-20 10:19:33,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4766980.0, ans=0.125 2024-08-20 10:19:41,060 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-20 10:19:57,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4767180.0, ans=0.07 2024-08-20 10:19:57,298 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.519e+01 2024-08-20 10:20:09,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4767180.0, ans=0.025 2024-08-20 10:20:12,090 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2550, loss[loss=0.1151, beats_loss=0.01061, ecapa_loss=0.0001243, whisper_loss=0.1032, over 16686.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001368, whisper_loss=0.09099, over 3830463.50 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:20:21,554 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 10:20:39,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.312e+01 2.481e+01 2.687e+01 3.912e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 10:21:05,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4767580.0, ans=0.125 2024-08-20 10:21:16,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4767580.0, ans=0.2 2024-08-20 10:21:19,590 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 17 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-20 10:21:28,106 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-20 10:21:33,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4767680.0, ans=0.125 2024-08-20 10:21:42,277 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2600, loss[loss=0.0846, beats_loss=0.01097, ecapa_loss=0.0001027, whisper_loss=0.0726, over 15611.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001365, whisper_loss=0.09005, over 3801500.01 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:22:04,883 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 10:22:17,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4767980.0, ans=0.125 2024-08-20 10:22:19,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4767980.0, ans=0.2 2024-08-20 10:22:26,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4767980.0, ans=0.04949747468305833 2024-08-20 10:22:41,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=12.0 2024-08-20 10:22:58,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4768180.0, ans=0.125 2024-08-20 10:23:10,965 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2650, loss[loss=0.09562, beats_loss=0.009528, ecapa_loss=0.0001306, whisper_loss=0.08478, over 22216.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001366, whisper_loss=0.09019, over 3803260.52 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:23:22,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4768280.0, ans=0.125 2024-08-20 10:23:38,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.210e+01 2.428e+01 2.721e+01 4.084e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 10:23:38,476 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 10:23:40,100 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 10:24:04,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4768580.0, ans=0.07 2024-08-20 10:24:10,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4768580.0, ans=0.125 2024-08-20 10:24:15,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4768580.0, ans=0.0 2024-08-20 10:24:31,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4768680.0, ans=0.125 2024-08-20 10:24:41,439 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2700, loss[loss=0.1223, beats_loss=0.009113, ecapa_loss=0.0001478, whisper_loss=0.1117, over 19502.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001377, whisper_loss=0.0901, over 3781375.66 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:24:54,821 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-08-20 10:25:08,001 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 10:25:26,227 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 10:25:26,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4768980.0, ans=0.1 2024-08-20 10:25:54,758 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 10:25:56,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-20 10:26:05,299 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-20 10:26:12,618 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2750, loss[loss=0.1218, beats_loss=0.009247, ecapa_loss=0.0001107, whisper_loss=0.1115, over 18284.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001374, whisper_loss=0.09046, over 3773955.91 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:26:13,812 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.666e-02 2024-08-20 10:26:33,531 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 40 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 10:26:38,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.290e+01 2.547e+01 2.861e+01 3.965e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-20 10:27:28,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4769680.0, ans=0.125 2024-08-20 10:27:29,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4769680.0, ans=0.125 2024-08-20 10:27:37,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2024-08-20 10:27:41,794 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2800, loss[loss=0.1028, beats_loss=0.01114, ecapa_loss=0.0001048, whisper_loss=0.09063, over 21165.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0103, ecapa_loss=0.0001366, whisper_loss=0.09102, over 3772084.76 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:27:51,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4769780.0, ans=0.0 2024-08-20 10:28:03,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4769880.0, ans=0.125 2024-08-20 10:28:09,971 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 28 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 10:28:19,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-20 10:28:22,384 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-20 10:28:46,914 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4770080.0, ans=0.07 2024-08-20 10:29:02,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=12.0 2024-08-20 10:29:07,406 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 10:29:10,468 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2850, loss[loss=0.0993, beats_loss=0.008778, ecapa_loss=0.0001425, whisper_loss=0.08909, over 16368.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01024, ecapa_loss=0.0001371, whisper_loss=0.09047, over 3764255.31 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:29:27,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4770380.0, ans=0.0 2024-08-20 10:29:37,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.626e+01 2.251e+01 2.450e+01 2.765e+01 5.044e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-20 10:30:23,961 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:30:30,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4770680.0, ans=0.125 2024-08-20 10:30:37,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4770780.0, ans=0.125 2024-08-20 10:30:38,577 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2900, loss[loss=0.09564, beats_loss=0.01205, ecapa_loss=0.0001648, whisper_loss=0.08194, over 15307.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01025, ecapa_loss=0.0001382, whisper_loss=0.0908, over 3773091.51 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:30:39,112 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 10:30:41,186 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.215e-01 2024-08-20 10:30:47,252 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 10:30:48,956 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 10:30:52,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2024-08-20 10:31:13,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4770980.0, ans=0.5 2024-08-20 10:31:36,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4771080.0, ans=0.1 2024-08-20 10:31:54,307 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-20 10:32:08,262 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 2950, loss[loss=0.0888, beats_loss=0.01117, ecapa_loss=0.000127, whisper_loss=0.07636, over 18857.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01022, ecapa_loss=0.0001392, whisper_loss=0.09123, over 3809856.50 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:32:18,324 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 10:32:35,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.380e+01 2.528e+01 2.893e+01 7.268e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-20 10:32:37,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4771380.0, ans=0.125 2024-08-20 10:32:54,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4771480.0, ans=0.125 2024-08-20 10:33:03,632 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 17 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 10:33:30,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4771680.0, ans=0.5 2024-08-20 10:33:37,799 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3000, loss[loss=0.1054, beats_loss=0.01329, ecapa_loss=0.0001132, whisper_loss=0.09095, over 22296.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01031, ecapa_loss=0.0001394, whisper_loss=0.09067, over 3824914.72 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:33:37,799 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 10:34:13,869 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on ASR_libri: loss=0.2557, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2506, over 931116.00 frames. 2024-08-20 10:34:36,495 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on SV_voxceleb1: loss=0.003928, beats_loss=0, ecapa_loss=0.0003928, whisper_loss=0, over 944235.00 frames. 2024-08-20 10:36:13,006 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 10:36:13,010 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 10:36:21,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4771780.0, ans=0.125 2024-08-20 10:36:29,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=22.5 2024-08-20 10:36:43,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4771880.0, ans=0.0 2024-08-20 10:36:58,801 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 34 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-20 10:36:59,363 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.27 vs. limit=22.5 2024-08-20 10:37:12,351 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 32 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 10:37:14,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4772080.0, ans=0.125 2024-08-20 10:37:25,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4772180.0, ans=0.1 2024-08-20 10:37:33,586 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3050, loss[loss=0.1025, beats_loss=0.01, ecapa_loss=0.0001337, whisper_loss=0.09113, over 13691.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01029, ecapa_loss=0.0001397, whisper_loss=0.09108, over 3814418.51 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:37:39,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4772280.0, ans=0.0 2024-08-20 10:37:58,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.325e+01 2.539e+01 2.982e+01 4.388e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-20 10:38:02,211 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 10:38:11,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4772480.0, ans=0.0 2024-08-20 10:38:15,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4772480.0, ans=15.0 2024-08-20 10:38:19,706 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 10:38:20,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4772480.0, ans=0.125 2024-08-20 10:38:24,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4772580.0, ans=0.125 2024-08-20 10:38:33,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4772580.0, ans=0.125 2024-08-20 10:38:40,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.55 vs. limit=22.5 2024-08-20 10:38:53,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4772680.0, ans=0.125 2024-08-20 10:38:55,598 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3100, loss[loss=0.093, beats_loss=0.01018, ecapa_loss=0.0001631, whisper_loss=0.08119, over 18746.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09111, over 3828864.63 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:39:03,803 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 10:39:16,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4772880.0, ans=0.125 2024-08-20 10:39:24,411 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:39:27,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4772980.0, ans=0.1 2024-08-20 10:39:34,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4772980.0, ans=0.1 2024-08-20 10:39:35,680 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-20 10:39:37,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4772980.0, ans=0.125 2024-08-20 10:39:58,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4773080.0, ans=0.125 2024-08-20 10:40:04,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4773180.0, ans=0.125 2024-08-20 10:40:09,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4773180.0, ans=0.0 2024-08-20 10:40:10,853 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-20 10:40:13,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-20 10:40:17,699 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3150, loss[loss=0.1015, beats_loss=0.009332, ecapa_loss=0.0001579, whisper_loss=0.09062, over 18577.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001408, whisper_loss=0.09077, over 3841546.22 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:40:19,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.93 vs. limit=12.0 2024-08-20 10:40:30,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2024-08-20 10:40:40,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4773380.0, ans=0.2 2024-08-20 10:40:41,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4773380.0, ans=0.1 2024-08-20 10:40:42,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.206e+01 2.480e+01 2.972e+01 5.332e+01, threshold=4.960e+01, percent-clipped=1.0 2024-08-20 10:40:51,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4773480.0, ans=0.125 2024-08-20 10:41:10,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-20 10:41:26,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4773680.0, ans=0.125 2024-08-20 10:41:30,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4773680.0, ans=0.2 2024-08-20 10:41:37,038 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 10:41:37,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4773780.0, ans=0.0 2024-08-20 10:41:37,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4773780.0, ans=0.125 2024-08-20 10:41:37,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-20 10:41:38,129 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3200, loss[loss=0.0822, beats_loss=0.01013, ecapa_loss=0.000158, whisper_loss=0.07049, over 15325.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.09157, over 3830679.78 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:41:50,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4773780.0, ans=0.125 2024-08-20 10:41:59,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4773880.0, ans=0.2 2024-08-20 10:42:08,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-20 10:42:19,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4773980.0, ans=0.0 2024-08-20 10:42:21,567 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 18 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 10:42:24,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-08-20 10:42:27,700 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 18 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-20 10:42:27,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4774080.0, ans=0.0 2024-08-20 10:42:35,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4774080.0, ans=0.1 2024-08-20 10:42:37,165 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 10:42:40,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4774080.0, ans=0.125 2024-08-20 10:42:59,581 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3250, loss[loss=0.1231, beats_loss=0.007575, ecapa_loss=0.0001689, whisper_loss=0.1138, over 22379.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.09095, over 3804466.61 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:43:02,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4774280.0, ans=0.125 2024-08-20 10:43:04,529 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 10:43:09,214 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 17 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 10:43:18,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-20 10:43:23,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4774380.0, ans=0.0 2024-08-20 10:43:25,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.267e+01 2.440e+01 2.710e+01 3.634e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 10:44:06,800 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-20 10:44:10,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2024-08-20 10:44:25,478 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3300, loss[loss=0.1002, beats_loss=0.01146, ecapa_loss=0.0001378, whisper_loss=0.08735, over 17591.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.09127, over 3806835.28 frames. ], batch size: 72, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:44:32,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4774780.0, ans=0.125 2024-08-20 10:44:39,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4774780.0, ans=0.125 2024-08-20 10:44:58,731 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 29 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 10:45:14,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4774980.0, ans=0.025 2024-08-20 10:45:26,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4775080.0, ans=0.125 2024-08-20 10:45:26,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=4775080.0, ans=0.02 2024-08-20 10:45:34,306 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 10:45:50,683 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3350, loss[loss=0.1042, beats_loss=0.01057, ecapa_loss=0.0001478, whisper_loss=0.09211, over 12967.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001398, whisper_loss=0.09118, over 3804746.22 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:45:51,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4775280.0, ans=10.0 2024-08-20 10:45:51,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-20 10:45:54,308 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 10:45:54,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4775280.0, ans=0.125 2024-08-20 10:45:57,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4775280.0, ans=0.04949747468305833 2024-08-20 10:46:02,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4775280.0, ans=0.1 2024-08-20 10:46:10,580 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 10:46:10,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4775380.0, ans=0.1 2024-08-20 10:46:12,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4775380.0, ans=0.125 2024-08-20 10:46:14,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4775380.0, ans=0.125 2024-08-20 10:46:14,728 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.151e+00 2024-08-20 10:46:16,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.283e+01 2.507e+01 2.665e+01 5.653e+01, threshold=5.015e+01, percent-clipped=1.0 2024-08-20 10:46:27,147 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 26 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-20 10:46:46,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4775580.0, ans=0.1 2024-08-20 10:47:08,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4775680.0, ans=0.1 2024-08-20 10:47:11,601 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 10:47:11,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4775780.0, ans=0.0 2024-08-20 10:47:13,172 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3400, loss[loss=0.1171, beats_loss=0.00993, ecapa_loss=0.0001266, whisper_loss=0.1059, over 17519.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01029, ecapa_loss=0.0001393, whisper_loss=0.09146, over 3791970.63 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:47:22,055 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 10:47:47,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4775980.0, ans=0.125 2024-08-20 10:48:24,668 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-20 10:48:36,716 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3450, loss[loss=0.09654, beats_loss=0.008689, ecapa_loss=0.0001573, whisper_loss=0.08628, over 16334.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01018, ecapa_loss=0.0001406, whisper_loss=0.09204, over 3805700.60 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:48:37,255 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 10:48:40,634 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 10:49:02,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.189e+01 2.533e+01 2.792e+01 4.546e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-20 10:49:09,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4776480.0, ans=0.2 2024-08-20 10:49:15,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4776480.0, ans=0.0 2024-08-20 10:49:20,993 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-20 10:49:23,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4776480.0, ans=0.0 2024-08-20 10:49:30,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=12.0 2024-08-20 10:49:33,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4776580.0, ans=0.125 2024-08-20 10:49:59,514 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3500, loss[loss=0.1132, beats_loss=0.01221, ecapa_loss=0.000123, whisper_loss=0.09972, over 23245.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001405, whisper_loss=0.09103, over 3826680.42 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:50:15,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4776880.0, ans=0.125 2024-08-20 10:50:36,435 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4776980.0, ans=0.125 2024-08-20 10:50:40,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-08-20 10:50:46,384 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 10:51:00,831 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-20 10:51:11,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4777180.0, ans=0.125 2024-08-20 10:51:25,489 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3550, loss[loss=0.1135, beats_loss=0.007641, ecapa_loss=0.0001404, whisper_loss=0.1045, over 14806.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001397, whisper_loss=0.09074, over 3825154.65 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:51:50,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4777380.0, ans=0.09899494936611666 2024-08-20 10:51:53,129 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.256e+01 2.446e+01 2.719e+01 3.472e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 10:51:57,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4777380.0, ans=0.0 2024-08-20 10:52:13,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4777480.0, ans=0.025 2024-08-20 10:52:13,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-08-20 10:52:16,033 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 19 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-20 10:52:33,246 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 10:52:46,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-20 10:52:52,781 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3600, loss[loss=0.1147, beats_loss=0.01002, ecapa_loss=0.0001274, whisper_loss=0.1034, over 23851.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.09063, over 3858391.84 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:52:58,847 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 18 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 10:53:13,696 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=8.0 2024-08-20 10:53:15,639 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 21 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 10:53:15,898 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4777880.0, ans=0.05 2024-08-20 10:53:40,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4777980.0, ans=0.1 2024-08-20 10:53:47,494 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.047e-01 2024-08-20 10:53:59,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4778080.0, ans=0.0 2024-08-20 10:54:21,871 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 10:54:35,083 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 35 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 10:54:48,884 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 10:54:51,594 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3650, loss[loss=0.1, beats_loss=0.01005, ecapa_loss=0.0001564, whisper_loss=0.0884, over 23314.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001391, whisper_loss=0.09025, over 3834730.73 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:55:33,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.274e+01 2.510e+01 2.811e+01 1.402e+02, threshold=5.019e+01, percent-clipped=2.0 2024-08-20 10:55:49,480 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-20 10:55:57,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-20 10:56:14,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4778580.0, ans=0.125 2024-08-20 10:56:41,414 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 10:56:53,783 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3700, loss[loss=0.09428, beats_loss=0.01166, ecapa_loss=0.0001466, whisper_loss=0.08115, over 20210.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.0001389, whisper_loss=0.08971, over 3801119.66 frames. ], batch size: 85, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:57:07,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4778780.0, ans=0.0 2024-08-20 10:57:25,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2024-08-20 10:57:38,139 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 13 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 10:58:44,854 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3750, loss[loss=0.08576, beats_loss=0.01199, ecapa_loss=0.000115, whisper_loss=0.07262, over 22341.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001391, whisper_loss=0.09026, over 3814205.26 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:59:00,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4779280.0, ans=0.1 2024-08-20 10:59:06,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4779280.0, ans=0.0 2024-08-20 10:59:09,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4779380.0, ans=0.0 2024-08-20 10:59:20,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4779380.0, ans=0.0 2024-08-20 10:59:28,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.202e+01 2.451e+01 2.797e+01 4.489e+01, threshold=4.903e+01, percent-clipped=0.0 2024-08-20 10:59:34,717 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 17 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 10:59:58,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4779580.0, ans=0.0 2024-08-20 11:00:04,878 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 11:00:26,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4779680.0, ans=0.0 2024-08-20 11:00:33,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4779680.0, ans=0.0 2024-08-20 11:00:45,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4779680.0, ans=0.125 2024-08-20 11:00:50,445 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3800, loss[loss=0.1119, beats_loss=0.01027, ecapa_loss=0.0001598, whisper_loss=0.1001, over 21693.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001399, whisper_loss=0.08961, over 3813257.63 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:00:50,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4779780.0, ans=0.1 2024-08-20 11:01:14,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4779880.0, ans=0.1 2024-08-20 11:01:22,140 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 11:02:27,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4780080.0, ans=0.2 2024-08-20 11:02:57,337 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3850, loss[loss=0.08307, beats_loss=0.01256, ecapa_loss=0.0001281, whisper_loss=0.06923, over 15999.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08968, over 3807013.79 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:02:57,571 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 13 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 11:02:57,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4780280.0, ans=10.0 2024-08-20 11:03:19,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4780380.0, ans=0.09899494936611666 2024-08-20 11:03:32,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.347e+01 2.624e+01 2.897e+01 4.079e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 11:03:57,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=22.5 2024-08-20 11:04:00,032 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 11:04:02,039 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 11:04:29,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4780680.0, ans=0.125 2024-08-20 11:04:40,743 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3900, loss[loss=0.06704, beats_loss=0.01573, ecapa_loss=0.000112, whisper_loss=0.05019, over 16354.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.000139, whisper_loss=0.08952, over 3801644.55 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:04:47,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4780780.0, ans=0.2 2024-08-20 11:04:54,631 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2024-08-20 11:05:10,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4780880.0, ans=0.025 2024-08-20 11:05:11,673 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 11:05:21,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4780980.0, ans=0.125 2024-08-20 11:05:27,668 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 11:05:28,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.07 vs. limit=10.0 2024-08-20 11:05:39,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4780980.0, ans=0.05 2024-08-20 11:06:05,437 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 11:06:31,150 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 3950, loss[loss=0.1315, beats_loss=0.00562, ecapa_loss=0.0001814, whisper_loss=0.1241, over 22265.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.09023, over 3819731.01 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:06:32,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2024-08-20 11:06:36,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4781280.0, ans=0.0 2024-08-20 11:06:38,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4781280.0, ans=0.0 2024-08-20 11:06:41,844 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 11:07:08,289 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.311e+01 2.584e+01 2.900e+01 3.704e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-20 11:07:08,543 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 11:07:36,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4781580.0, ans=0.125 2024-08-20 11:08:15,336 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4000, loss[loss=0.1247, beats_loss=0.009457, ecapa_loss=0.0001581, whisper_loss=0.1137, over 17335.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001405, whisper_loss=0.09106, over 3868567.12 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:08:32,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4781780.0, ans=0.125 2024-08-20 11:08:34,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4781880.0, ans=0.125 2024-08-20 11:09:20,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4782080.0, ans=0.0 2024-08-20 11:09:25,842 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 11:09:51,731 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2024-08-20 11:10:07,426 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4050, loss[loss=0.09947, beats_loss=0.009802, ecapa_loss=0.0001364, whisper_loss=0.0883, over 19719.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01023, ecapa_loss=0.0001411, whisper_loss=0.09221, over 3873220.85 frames. ], batch size: 78, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:10:10,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2024-08-20 11:10:11,988 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 11:10:40,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=12.0 2024-08-20 11:10:45,007 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 28 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 11:10:45,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4782380.0, ans=0.2 2024-08-20 11:10:50,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.323e+01 2.567e+01 2.848e+01 3.981e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-20 11:11:20,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4782580.0, ans=0.2 2024-08-20 11:11:29,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4782580.0, ans=0.125 2024-08-20 11:11:33,671 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 11:11:42,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4782580.0, ans=0.1 2024-08-20 11:12:09,985 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4100, loss[loss=0.1177, beats_loss=0.01048, ecapa_loss=0.0001015, whisper_loss=0.1062, over 24148.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0103, ecapa_loss=0.0001402, whisper_loss=0.09191, over 3878390.03 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:12:24,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4782780.0, ans=0.0 2024-08-20 11:12:25,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4782780.0, ans=0.2 2024-08-20 11:12:51,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4782980.0, ans=0.1 2024-08-20 11:12:57,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.07 vs. limit=22.5 2024-08-20 11:13:05,604 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 32 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 11:13:13,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=22.5 2024-08-20 11:13:14,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4783080.0, ans=0.1 2024-08-20 11:13:22,160 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:13:36,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4783180.0, ans=0.125 2024-08-20 11:13:39,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4783280.0, ans=0.0 2024-08-20 11:13:41,067 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4150, loss[loss=0.1012, beats_loss=0.01075, ecapa_loss=0.0001004, whisper_loss=0.08948, over 18879.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01026, ecapa_loss=0.0001398, whisper_loss=0.09163, over 3838588.62 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:13:41,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4783280.0, ans=0.0 2024-08-20 11:13:47,640 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-20 11:13:50,275 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 29 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 11:14:11,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.670e+01 3.158e+01 1.265e+02, threshold=5.340e+01, percent-clipped=2.0 2024-08-20 11:14:12,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4783380.0, ans=0.5 2024-08-20 11:14:29,067 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 17 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 11:14:39,617 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 20 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-20 11:15:08,198 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4200, loss[loss=0.1081, beats_loss=0.01157, ecapa_loss=0.0001264, whisper_loss=0.09524, over 22256.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001405, whisper_loss=0.09059, over 3826648.97 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:15:21,621 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:15:32,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.26 vs. limit=6.0 2024-08-20 11:15:42,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4783980.0, ans=0.125 2024-08-20 11:15:42,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4783980.0, ans=0.09899494936611666 2024-08-20 11:15:43,521 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 11:16:32,629 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-20 11:16:37,533 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4250, loss[loss=0.126, beats_loss=0.00895, ecapa_loss=0.0001523, whisper_loss=0.1156, over 22646.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.09098, over 3863564.57 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:16:50,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4784280.0, ans=0.0 2024-08-20 11:16:59,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4784380.0, ans=0.1 2024-08-20 11:17:07,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.233e+01 2.477e+01 2.798e+01 4.198e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-20 11:17:17,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4784480.0, ans=0.95 2024-08-20 11:17:25,700 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 11:18:05,736 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4300, loss[loss=0.09133, beats_loss=0.01455, ecapa_loss=0.0001214, whisper_loss=0.07556, over 23129.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001408, whisper_loss=0.09043, over 3836853.72 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:18:23,664 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 11:18:36,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-08-20 11:18:55,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4784980.0, ans=0.0 2024-08-20 11:19:21,430 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=22.5 2024-08-20 11:19:32,882 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4350, loss[loss=0.07985, beats_loss=0.01255, ecapa_loss=0.0001404, whisper_loss=0.0659, over 18621.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.09035, over 3810312.18 frames. ], batch size: 78, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:19:35,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4785280.0, ans=0.0 2024-08-20 11:19:53,422 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-20 11:20:00,251 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-20 11:20:02,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.221e+01 2.524e+01 2.843e+01 5.218e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 11:20:17,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4785480.0, ans=0.0 2024-08-20 11:20:19,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-20 11:20:47,838 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 11:21:01,238 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4400, loss[loss=0.1109, beats_loss=0.00874, ecapa_loss=0.0001183, whisper_loss=0.101, over 23789.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.0912, over 3844586.83 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:21:09,103 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:21:09,123 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:21:24,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4785880.0, ans=0.0 2024-08-20 11:21:26,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4785880.0, ans=0.125 2024-08-20 11:21:36,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4785980.0, ans=0.0 2024-08-20 11:21:38,558 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 11:21:40,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4785980.0, ans=0.125 2024-08-20 11:21:42,002 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4785980.0, ans=0.0 2024-08-20 11:21:50,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4785980.0, ans=0.0 2024-08-20 11:22:00,571 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 11:22:03,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=4786080.0, ans=22.5 2024-08-20 11:22:07,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4786080.0, ans=0.125 2024-08-20 11:22:10,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4786080.0, ans=0.125 2024-08-20 11:22:12,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4786180.0, ans=0.125 2024-08-20 11:22:17,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4786180.0, ans=0.125 2024-08-20 11:22:20,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4786180.0, ans=0.95 2024-08-20 11:22:27,456 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-20 11:22:30,823 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4450, loss[loss=0.1156, beats_loss=0.01005, ecapa_loss=0.0001578, whisper_loss=0.1039, over 22884.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001392, whisper_loss=0.0911, over 3864156.88 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:22:48,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4786380.0, ans=0.0 2024-08-20 11:22:50,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4786380.0, ans=22.5 2024-08-20 11:22:51,855 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 29 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-20 11:22:57,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4786380.0, ans=0.125 2024-08-20 11:23:00,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.276e+01 2.505e+01 2.862e+01 6.840e+01, threshold=5.011e+01, percent-clipped=2.0 2024-08-20 11:23:03,910 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 21 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 11:23:37,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4786580.0, ans=0.05 2024-08-20 11:23:46,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4786680.0, ans=0.125 2024-08-20 11:23:57,870 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4500, loss[loss=0.1036, beats_loss=0.0121, ecapa_loss=0.0001317, whisper_loss=0.09016, over 18256.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.09025, over 3812498.05 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:23:58,103 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 18 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 11:24:07,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4786780.0, ans=0.0 2024-08-20 11:24:09,688 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2024-08-20 11:24:20,791 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 11:24:37,484 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 24 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-20 11:24:42,373 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06896068155765533, model_norm_threshold=50.106773376464844 2024-08-20 11:24:42,531 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.994e+04, grad_sumsq=4.994e+04, orig_rms_sq=1.000e+00 2024-08-20 11:24:48,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4787080.0, ans=0.125 2024-08-20 11:24:51,292 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 11:25:00,727 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-20 11:25:25,024 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4550, loss[loss=0.1045, beats_loss=0.008889, ecapa_loss=0.0001501, whisper_loss=0.09411, over 15358.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.08965, over 3770536.10 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:25:38,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4787280.0, ans=0.1 2024-08-20 11:25:56,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.635e+01 2.231e+01 2.496e+01 2.825e+01 7.266e+02, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 11:25:57,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-20 11:26:11,162 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 11:26:22,300 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 11:26:33,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-20 11:26:43,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4787680.0, ans=0.125 2024-08-20 11:26:55,702 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4600, loss[loss=0.1045, beats_loss=0.009755, ecapa_loss=0.000148, whisper_loss=0.09329, over 21775.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001399, whisper_loss=0.08978, over 3786938.21 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:27:14,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4787880.0, ans=0.125 2024-08-20 11:27:32,413 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 11:27:51,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4788080.0, ans=0.125 2024-08-20 11:27:54,935 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 11:27:56,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4788080.0, ans=0.125 2024-08-20 11:28:03,405 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 11:28:17,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4788180.0, ans=0.0 2024-08-20 11:28:26,273 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4650, loss[loss=0.1049, beats_loss=0.009011, ecapa_loss=0.0001361, whisper_loss=0.09452, over 21951.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.08957, over 3835263.30 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:28:37,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4788280.0, ans=0.1 2024-08-20 11:28:43,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4788380.0, ans=0.125 2024-08-20 11:28:56,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+01 2.344e+01 2.573e+01 2.789e+01 3.818e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-20 11:29:13,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4788480.0, ans=0.04949747468305833 2024-08-20 11:29:39,861 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:29:39,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4788680.0, ans=0.125 2024-08-20 11:29:43,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4788680.0, ans=0.125 2024-08-20 11:29:54,667 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 11:29:56,538 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4700, loss[loss=0.09146, beats_loss=0.0118, ecapa_loss=0.0001373, whisper_loss=0.07828, over 19825.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001382, whisper_loss=0.0892, over 3797514.35 frames. ], batch size: 81, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:30:18,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4788880.0, ans=0.2 2024-08-20 11:30:19,162 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 11:31:06,811 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=22.5 2024-08-20 11:31:10,084 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2024-08-20 11:31:19,726 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 11:31:21,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4789280.0, ans=0.125 2024-08-20 11:31:22,459 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4750, loss[loss=0.1042, beats_loss=0.009348, ecapa_loss=0.0001644, whisper_loss=0.09326, over 19226.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.0892, over 3782432.73 frames. ], batch size: 82, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:31:35,632 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:31:47,771 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 13 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 11:31:52,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.256e+01 2.490e+01 2.747e+01 3.725e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 11:32:01,561 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 11:32:05,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-20 11:32:18,766 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 29 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 11:32:55,387 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4800, loss[loss=0.1078, beats_loss=0.008987, ecapa_loss=0.0001473, whisper_loss=0.09734, over 21149.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.0888, over 3774540.34 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:33:06,372 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 11:33:10,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4789780.0, ans=0.125 2024-08-20 11:33:31,078 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 11:34:06,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4790080.0, ans=0.125 2024-08-20 11:34:42,505 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4850, loss[loss=0.123, beats_loss=0.008938, ecapa_loss=0.000121, whisper_loss=0.1129, over 23765.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08931, over 3768180.64 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:35:27,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.360e+01 2.620e+01 2.940e+01 4.009e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-20 11:35:44,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-08-20 11:36:04,463 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2024-08-20 11:36:14,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4790580.0, ans=0.2 2024-08-20 11:36:57,846 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4900, loss[loss=0.0824, beats_loss=0.01064, ecapa_loss=0.0001538, whisper_loss=0.07023, over 17378.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.08962, over 3778974.83 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:37:26,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4790880.0, ans=0.125 2024-08-20 11:37:41,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4790880.0, ans=0.125 2024-08-20 11:38:10,279 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 11:38:43,935 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 12 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 11:38:50,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4791180.0, ans=0.2 2024-08-20 11:39:11,004 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 4950, loss[loss=0.08813, beats_loss=0.01178, ecapa_loss=9.96e-05, whisper_loss=0.07535, over 15284.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001397, whisper_loss=0.08907, over 3790131.65 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:39:16,603 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 18 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 11:39:22,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=12.0 2024-08-20 11:39:54,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.340e+01 2.516e+01 2.750e+01 4.505e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-20 11:40:51,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4791680.0, ans=0.2 2024-08-20 11:41:16,278 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5000, loss[loss=0.1246, beats_loss=0.01035, ecapa_loss=0.0001086, whisper_loss=0.1131, over 21483.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.0898, over 3808112.95 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:41:26,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4791780.0, ans=0.09899494936611666 2024-08-20 11:41:52,493 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.40 vs. limit=10.0 2024-08-20 11:41:54,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-20 11:41:58,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4791880.0, ans=0.09899494936611666 2024-08-20 11:42:21,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4791980.0, ans=0.0 2024-08-20 11:42:41,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4792080.0, ans=0.125 2024-08-20 11:43:10,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-20 11:43:18,651 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5050, loss[loss=0.09943, beats_loss=0.01227, ecapa_loss=0.0001204, whisper_loss=0.08596, over 14815.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.08954, over 3789954.93 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:43:45,191 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 11:43:58,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2024-08-20 11:43:59,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.312e+01 2.479e+01 2.896e+01 1.864e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-20 11:44:28,618 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 11:44:46,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4792580.0, ans=0.2 2024-08-20 11:44:46,361 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2024-08-20 11:45:00,528 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-20 11:45:04,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4792680.0, ans=0.0 2024-08-20 11:45:16,461 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5100, loss[loss=0.1187, beats_loss=0.01145, ecapa_loss=0.0001408, whisper_loss=0.1058, over 17528.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.0892, over 3785749.28 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:45:27,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4792780.0, ans=0.125 2024-08-20 11:45:39,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4792880.0, ans=0.1 2024-08-20 11:46:24,218 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4792980.0, ans=0.0 2024-08-20 11:46:42,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4793080.0, ans=0.0 2024-08-20 11:46:55,465 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 11:47:19,406 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5150, loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001336, whisper_loss=0.09203, over 23371.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08939, over 3791135.37 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:47:41,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4793280.0, ans=0.125 2024-08-20 11:48:01,219 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.291e+01 2.565e+01 2.823e+01 3.881e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 11:48:06,765 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 11:48:07,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4793480.0, ans=0.1 2024-08-20 11:48:08,876 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 11:48:10,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4793480.0, ans=0.1 2024-08-20 11:48:30,856 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 11:48:50,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-08-20 11:49:02,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4793680.0, ans=0.0 2024-08-20 11:49:23,403 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5200, loss[loss=0.1073, beats_loss=0.0102, ecapa_loss=0.0001087, whisper_loss=0.09601, over 15544.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.000142, whisper_loss=0.0904, over 3810607.16 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:49:37,766 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-20 11:49:47,166 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 11:49:47,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4793880.0, ans=0.125 2024-08-20 11:50:02,352 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 11:50:02,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4793880.0, ans=0.1 2024-08-20 11:50:20,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4793980.0, ans=15.0 2024-08-20 11:51:04,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-20 11:51:12,820 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 11:51:13,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-08-20 11:51:27,312 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5250, loss[loss=0.101, beats_loss=0.008471, ecapa_loss=0.0001306, whisper_loss=0.09118, over 17145.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.09063, over 3802771.54 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:51:28,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4794280.0, ans=0.1 2024-08-20 11:51:35,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4794280.0, ans=0.125 2024-08-20 11:51:50,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4794380.0, ans=0.0 2024-08-20 11:51:53,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4794380.0, ans=0.125 2024-08-20 11:52:06,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2024-08-20 11:52:11,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.300e+01 2.574e+01 2.899e+01 1.239e+02, threshold=5.148e+01, percent-clipped=2.0 2024-08-20 11:52:22,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4794480.0, ans=0.0 2024-08-20 11:53:16,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2024-08-20 11:53:31,836 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5300, loss[loss=0.1032, beats_loss=0.006194, ecapa_loss=0.0001418, whisper_loss=0.09562, over 15451.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.09076, over 3788169.23 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:53:32,060 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 11:53:32,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4794780.0, ans=0.1 2024-08-20 11:54:09,809 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 11:54:39,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4794980.0, ans=0.125 2024-08-20 11:54:39,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4794980.0, ans=0.025 2024-08-20 11:54:51,425 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 11:55:29,312 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5350, loss[loss=0.1273, beats_loss=0.008327, ecapa_loss=0.0001538, whisper_loss=0.1174, over 16974.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01031, ecapa_loss=0.0001399, whisper_loss=0.09072, over 3776715.08 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:56:10,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.312e+01 2.539e+01 2.746e+01 3.720e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-20 11:56:10,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4795380.0, ans=0.125 2024-08-20 11:56:13,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4795380.0, ans=0.1 2024-08-20 11:57:00,580 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 42 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 11:57:15,405 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 11:57:32,239 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5400, loss[loss=0.1052, beats_loss=0.009592, ecapa_loss=0.0001349, whisper_loss=0.09429, over 21485.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01029, ecapa_loss=0.0001391, whisper_loss=0.09067, over 3822365.64 frames. ], batch size: 84, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:57:43,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4795780.0, ans=0.1 2024-08-20 11:58:04,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4795880.0, ans=0.125 2024-08-20 11:58:10,833 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 11:58:12,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-20 11:58:17,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4795880.0, ans=0.125 2024-08-20 11:58:35,670 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 11:58:43,243 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 11:58:50,964 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 11:58:51,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4796080.0, ans=0.125 2024-08-20 11:58:55,416 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 27 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-20 11:59:22,460 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 11:59:35,891 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5450, loss[loss=0.09367, beats_loss=0.01054, ecapa_loss=0.0001648, whisper_loss=0.08149, over 21714.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01027, ecapa_loss=0.0001389, whisper_loss=0.09037, over 3784126.65 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:59:36,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4796280.0, ans=0.04949747468305833 2024-08-20 11:59:52,461 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 12:00:14,475 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 12:00:18,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.281e+01 2.461e+01 2.740e+01 4.925e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-20 12:00:51,574 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 12:00:51,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4796580.0, ans=0.2 2024-08-20 12:01:18,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4796680.0, ans=0.125 2024-08-20 12:01:43,294 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5500, loss[loss=0.1045, beats_loss=0.01073, ecapa_loss=0.0001491, whisper_loss=0.09228, over 21672.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0103, ecapa_loss=0.0001387, whisper_loss=0.09065, over 3805165.98 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:02:07,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4796880.0, ans=0.125 2024-08-20 12:02:12,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4796880.0, ans=0.125 2024-08-20 12:02:12,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4796880.0, ans=0.2 2024-08-20 12:02:40,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4796980.0, ans=0.1 2024-08-20 12:03:11,687 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 12:03:20,290 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 12:03:20,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4797180.0, ans=0.125 2024-08-20 12:03:23,509 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 12:03:30,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4797180.0, ans=0.0 2024-08-20 12:03:33,397 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 12:03:45,336 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5550, loss[loss=0.1143, beats_loss=0.01057, ecapa_loss=0.0001013, whisper_loss=0.1027, over 22178.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01023, ecapa_loss=0.0001392, whisper_loss=0.09068, over 3823746.71 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:03:45,963 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.196e-01 2024-08-20 12:04:08,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4797380.0, ans=0.125 2024-08-20 12:04:12,053 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:04:31,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.426e+01 2.728e+01 7.340e+01, threshold=4.852e+01, percent-clipped=2.0 2024-08-20 12:04:42,163 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 12:04:42,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4797480.0, ans=0.1 2024-08-20 12:05:29,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4797680.0, ans=0.0 2024-08-20 12:05:39,369 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-20 12:05:46,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4797680.0, ans=0.125 2024-08-20 12:05:53,997 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5600, loss[loss=0.07896, beats_loss=0.01109, ecapa_loss=0.0001407, whisper_loss=0.06647, over 18801.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01019, ecapa_loss=0.0001391, whisper_loss=0.09093, over 3842016.47 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:05:57,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4797780.0, ans=0.0 2024-08-20 12:05:59,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-20 12:06:03,270 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 12:07:21,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4798080.0, ans=0.0 2024-08-20 12:07:46,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4798180.0, ans=0.1 2024-08-20 12:07:48,211 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 12:08:03,400 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 12:08:08,255 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5650, loss[loss=0.1016, beats_loss=0.01092, ecapa_loss=0.0001497, whisper_loss=0.08913, over 20802.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01031, ecapa_loss=0.0001399, whisper_loss=0.09009, over 3871565.45 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:08:34,731 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 12:08:42,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4798380.0, ans=0.2 2024-08-20 12:08:49,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4798380.0, ans=0.0 2024-08-20 12:08:51,778 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.277e+01 2.490e+01 2.736e+01 3.914e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-20 12:09:02,850 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 12:09:23,603 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 12:10:10,231 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5700, loss[loss=0.1128, beats_loss=0.009357, ecapa_loss=0.0001209, whisper_loss=0.1022, over 17138.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001393, whisper_loss=0.09019, over 3869532.61 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:10:29,742 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 12:10:48,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4798880.0, ans=0.0 2024-08-20 12:10:51,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-20 12:10:52,047 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 19 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 12:11:02,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4798980.0, ans=0.0 2024-08-20 12:11:10,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4798980.0, ans=0.125 2024-08-20 12:11:16,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4798980.0, ans=0.0 2024-08-20 12:11:29,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4799080.0, ans=0.1 2024-08-20 12:12:00,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4799180.0, ans=0.09899494936611666 2024-08-20 12:12:09,994 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5750, loss[loss=0.08237, beats_loss=0.01152, ecapa_loss=0.0001609, whisper_loss=0.06924, over 19582.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01032, ecapa_loss=0.0001408, whisper_loss=0.08959, over 3847497.26 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:12:12,508 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 12:12:39,931 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 19 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 12:12:47,744 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-20 12:12:51,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.328e+01 2.562e+01 2.759e+01 3.925e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-20 12:13:49,912 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 12:14:11,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4799780.0, ans=0.0 2024-08-20 12:14:13,751 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5800, loss[loss=0.09658, beats_loss=0.01196, ecapa_loss=0.0001114, whisper_loss=0.0835, over 19049.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001414, whisper_loss=0.08922, over 3838553.46 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:14:32,383 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 13 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 12:14:40,324 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4799880.0, ans=0.125 2024-08-20 12:14:44,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4799880.0, ans=0.125 2024-08-20 12:14:57,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4799880.0, ans=0.5 2024-08-20 12:14:57,655 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4799880.0, ans=0.04949747468305833 2024-08-20 12:15:16,990 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-20 12:15:35,229 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 12:15:41,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4800080.0, ans=0.0 2024-08-20 12:15:53,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4800180.0, ans=0.5 2024-08-20 12:16:03,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4800180.0, ans=0.0 2024-08-20 12:16:18,524 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5850, loss[loss=0.0779, beats_loss=0.01198, ecapa_loss=0.000125, whisper_loss=0.06467, over 19569.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001407, whisper_loss=0.08928, over 3851167.91 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:16:42,191 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 12:16:58,530 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.301e+01 2.506e+01 2.861e+01 6.399e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 12:17:00,974 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 12:17:30,488 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-20 12:18:16,846 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5900, loss[loss=0.1068, beats_loss=0.00924, ecapa_loss=0.0001417, whisper_loss=0.09617, over 17122.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01032, ecapa_loss=0.0001406, whisper_loss=0.08956, over 3859354.16 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:18:25,848 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-20 12:18:37,746 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 12:18:43,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4800880.0, ans=0.015 2024-08-20 12:18:43,519 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4800880.0, ans=0.1 2024-08-20 12:18:52,072 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-20 12:19:07,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4800980.0, ans=0.2 2024-08-20 12:19:29,540 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 12:19:30,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4801080.0, ans=0.1 2024-08-20 12:19:38,758 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 17 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 12:20:05,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4801180.0, ans=0.0 2024-08-20 12:20:15,121 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 5950, loss[loss=0.09204, beats_loss=0.01215, ecapa_loss=0.0001287, whisper_loss=0.0786, over 18959.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0103, ecapa_loss=0.0001407, whisper_loss=0.08965, over 3844406.54 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:20:24,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4801280.0, ans=0.125 2024-08-20 12:20:25,093 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 39 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 12:20:31,840 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 21 from LS+wenet, 10 from Vox, 56 fro AS 2024-08-20 12:20:39,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4801380.0, ans=0.05 2024-08-20 12:20:55,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.276e+01 2.504e+01 2.875e+01 3.990e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-20 12:20:57,146 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4801380.0, ans=0.125 2024-08-20 12:20:59,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4801380.0, ans=0.125 2024-08-20 12:21:00,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4801480.0, ans=0.125 2024-08-20 12:21:12,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4801480.0, ans=0.2 2024-08-20 12:21:17,854 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 12:21:58,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4801680.0, ans=0.07 2024-08-20 12:22:06,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-20 12:22:09,477 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6000, loss[loss=0.09833, beats_loss=0.01224, ecapa_loss=0.0001221, whisper_loss=0.08487, over 22377.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01036, ecapa_loss=0.00014, whisper_loss=0.08899, over 3817687.26 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:22:09,477 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 12:22:45,771 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 12:23:08,865 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on SV_voxceleb1: loss=0.003913, beats_loss=0, ecapa_loss=0.0003913, whisper_loss=0, over 944235.00 frames. 2024-08-20 12:24:01,124 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8118, 4.6229, 4.1422, 4.6240], device='cuda:1') 2024-08-20 12:24:11,513 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9288, 3.4460, 2.3627, 3.8286], device='cuda:1') 2024-08-20 12:24:42,561 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0024, 3.8749, 3.4093, 3.7903], device='cuda:1') 2024-08-20 12:24:44,090 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 12:24:44,093 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 12:25:00,743 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 12:25:56,375 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 12:26:13,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4802180.0, ans=0.125 2024-08-20 12:26:27,635 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-20 12:26:34,612 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 12:26:36,928 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6050, loss[loss=0.1006, beats_loss=0.009229, ecapa_loss=0.0001605, whisper_loss=0.08972, over 17250.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08903, over 3841817.34 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:26:42,547 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=15.0 2024-08-20 12:27:04,461 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 12:27:10,162 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4802380.0, ans=0.125 2024-08-20 12:27:17,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.267e+01 2.529e+01 2.771e+01 4.356e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 12:27:18,762 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:28:01,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4802580.0, ans=0.125 2024-08-20 12:28:06,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4802580.0, ans=0.125 2024-08-20 12:28:31,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4802680.0, ans=0.125 2024-08-20 12:28:35,569 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6100, loss[loss=0.09956, beats_loss=0.0108, ecapa_loss=0.000153, whisper_loss=0.08723, over 22465.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08905, over 3812282.61 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:28:35,752 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 12:29:05,975 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 32 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 12:29:34,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4802980.0, ans=0.2 2024-08-20 12:29:34,844 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=22.5 2024-08-20 12:29:41,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-08-20 12:30:18,667 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 37 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 12:30:29,498 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6150, loss[loss=0.1258, beats_loss=0.008191, ecapa_loss=0.0001431, whisper_loss=0.1162, over 14732.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.08958, over 3809395.40 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:30:54,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4803380.0, ans=0.125 2024-08-20 12:31:04,567 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 12:31:07,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.457e+01 2.787e+01 2.276e+02, threshold=4.913e+01, percent-clipped=4.0 2024-08-20 12:31:14,438 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 12:31:37,949 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-20 12:31:43,733 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 12:31:47,877 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 12:32:21,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4803680.0, ans=0.1 2024-08-20 12:32:21,632 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-08-20 12:32:26,731 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6200, loss[loss=0.0854, beats_loss=0.01135, ecapa_loss=0.0001114, whisper_loss=0.07293, over 22095.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.00014, whisper_loss=0.0898, over 3817315.86 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:33:03,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4803880.0, ans=0.125 2024-08-20 12:33:46,178 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 12:33:57,176 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 12:34:08,020 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 12:34:09,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4804180.0, ans=0.125 2024-08-20 12:34:18,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4804180.0, ans=0.125 2024-08-20 12:34:21,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4804280.0, ans=0.1 2024-08-20 12:34:21,871 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6250, loss[loss=0.09057, beats_loss=0.0112, ecapa_loss=0.0001349, whisper_loss=0.07802, over 23262.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.089, over 3804321.12 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:34:22,050 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 12:34:23,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4804280.0, ans=0.125 2024-08-20 12:34:29,626 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 37 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 12:34:31,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4804280.0, ans=0.5 2024-08-20 12:34:35,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4804280.0, ans=0.1 2024-08-20 12:34:40,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4804280.0, ans=0.0 2024-08-20 12:34:53,753 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-20 12:35:02,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.372e+01 2.624e+01 2.911e+01 4.545e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 12:35:27,917 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 12:35:38,150 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 18 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-20 12:35:48,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4804580.0, ans=0.125 2024-08-20 12:35:48,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4804580.0, ans=0.1 2024-08-20 12:36:17,585 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6300, loss[loss=0.103, beats_loss=0.009617, ecapa_loss=0.0001308, whisper_loss=0.09209, over 22309.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.08959, over 3819851.56 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:36:26,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4804780.0, ans=0.0 2024-08-20 12:36:28,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4804780.0, ans=0.04949747468305833 2024-08-20 12:36:33,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-08-20 12:37:04,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4804980.0, ans=0.125 2024-08-20 12:37:41,833 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 12:37:51,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4805180.0, ans=0.0 2024-08-20 12:37:54,316 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 12:37:55,223 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-20 12:38:13,158 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6350, loss[loss=0.09475, beats_loss=0.01155, ecapa_loss=0.0001127, whisper_loss=0.08207, over 13373.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001388, whisper_loss=0.08951, over 3824783.32 frames. ], batch size: 50, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:38:18,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4805280.0, ans=0.125 2024-08-20 12:38:23,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4805280.0, ans=0.0 2024-08-20 12:38:25,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4805280.0, ans=0.125 2024-08-20 12:38:49,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4805380.0, ans=0.1 2024-08-20 12:38:53,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.243e+01 2.449e+01 2.846e+01 7.911e+01, threshold=4.899e+01, percent-clipped=2.0 2024-08-20 12:39:04,887 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-20 12:39:17,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4805480.0, ans=0.2 2024-08-20 12:39:24,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4805580.0, ans=0.05 2024-08-20 12:40:15,135 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6400, loss[loss=0.1121, beats_loss=0.009301, ecapa_loss=0.0001279, whisper_loss=0.1015, over 18519.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001399, whisper_loss=0.08983, over 3834268.90 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:40:23,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-20 12:40:25,215 WARNING [optim.py:496] (1/4) Scaling gradients by 0.015770763158798218, model_norm_threshold=48.98588180541992 2024-08-20 12:40:25,368 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.332e+06, grad_sumsq=1.479e+05, orig_rms_sq=9.003e+00 2024-08-20 12:40:25,571 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 12:40:46,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4805880.0, ans=0.125 2024-08-20 12:40:53,917 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 12:40:56,382 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 12:41:22,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4806080.0, ans=0.0 2024-08-20 12:41:31,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4806080.0, ans=0.0 2024-08-20 12:41:39,158 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 12:42:02,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4806180.0, ans=0.05 2024-08-20 12:42:10,947 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6450, loss[loss=0.1, beats_loss=0.01097, ecapa_loss=0.0001582, whisper_loss=0.08747, over 21853.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.09024, over 3857057.60 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:42:23,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4806280.0, ans=0.0 2024-08-20 12:42:37,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4806380.0, ans=0.125 2024-08-20 12:42:40,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4806380.0, ans=0.0 2024-08-20 12:42:48,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-20 12:42:58,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.273e+01 2.543e+01 2.928e+01 3.106e+03, threshold=5.086e+01, percent-clipped=1.0 2024-08-20 12:43:06,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4806480.0, ans=0.09899494936611666 2024-08-20 12:43:16,377 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 12:43:18,440 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07709788531064987, model_norm_threshold=50.86475372314453 2024-08-20 12:43:18,597 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.952e+04, grad_sumsq=6.952e+04, orig_rms_sq=1.000e+00 2024-08-20 12:43:31,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4806580.0, ans=0.2 2024-08-20 12:43:32,254 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 12:44:07,337 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6500, loss[loss=0.09969, beats_loss=0.009078, ecapa_loss=0.0001327, whisper_loss=0.08929, over 16648.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001412, whisper_loss=0.09049, over 3830721.77 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:44:18,307 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 20 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-20 12:44:20,285 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4806780.0, ans=0.125 2024-08-20 12:44:23,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4806780.0, ans=0.125 2024-08-20 12:44:34,341 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 12:44:36,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4806880.0, ans=0.125 2024-08-20 12:44:40,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4806880.0, ans=0.125 2024-08-20 12:44:43,532 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 12:44:45,002 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 12:44:58,722 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 12:45:15,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4807080.0, ans=0.015 2024-08-20 12:45:37,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4807180.0, ans=0.125 2024-08-20 12:45:40,563 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6550, loss[loss=0.09341, beats_loss=0.01271, ecapa_loss=0.0001009, whisper_loss=0.0797, over 23306.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001407, whisper_loss=0.09003, over 3830697.31 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:45:43,750 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 12:45:47,264 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-20 12:46:10,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.299e+01 2.491e+01 2.817e+01 6.597e+02, threshold=4.982e+01, percent-clipped=1.0 2024-08-20 12:46:11,901 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-20 12:46:16,250 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 25 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-20 12:46:19,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4807480.0, ans=0.125 2024-08-20 12:46:19,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4807480.0, ans=0.125 2024-08-20 12:46:55,020 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 22 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-20 12:47:30,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4807780.0, ans=0.125 2024-08-20 12:47:32,727 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6600, loss[loss=0.09988, beats_loss=0.01197, ecapa_loss=0.0001403, whisper_loss=0.0865, over 20966.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.09078, over 3845079.33 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:47:38,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4807780.0, ans=0.125 2024-08-20 12:48:09,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4807880.0, ans=0.125 2024-08-20 12:48:20,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-20 12:49:14,673 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 14 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 12:49:19,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4808180.0, ans=0.125 2024-08-20 12:49:36,571 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6650, loss[loss=0.1004, beats_loss=0.01, ecapa_loss=0.0001545, whisper_loss=0.08881, over 21693.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01028, ecapa_loss=0.0001417, whisper_loss=0.09158, over 3861118.20 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:49:43,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4808280.0, ans=0.125 2024-08-20 12:49:51,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4808280.0, ans=0.0 2024-08-20 12:50:15,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.590e+01 2.324e+01 2.596e+01 3.081e+01 4.862e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-20 12:50:32,381 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 12:50:34,332 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 12:51:15,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2024-08-20 12:51:17,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-20 12:51:33,289 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6700, loss[loss=0.102, beats_loss=0.01112, ecapa_loss=0.0001041, whisper_loss=0.08979, over 14212.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0103, ecapa_loss=0.0001411, whisper_loss=0.09149, over 3896648.03 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:51:33,550 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 17 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 12:52:01,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4808880.0, ans=0.125 2024-08-20 12:52:03,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4808880.0, ans=0.95 2024-08-20 12:52:12,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4808980.0, ans=0.0 2024-08-20 12:52:32,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4809080.0, ans=0.125 2024-08-20 12:52:54,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2024-08-20 12:53:05,832 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6750, loss[loss=0.08923, beats_loss=0.01115, ecapa_loss=0.0001091, whisper_loss=0.07699, over 21809.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001415, whisper_loss=0.09109, over 3858327.31 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:53:08,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4809280.0, ans=0.05 2024-08-20 12:53:35,108 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.304e+01 2.494e+01 2.775e+01 4.602e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-20 12:53:40,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4809480.0, ans=0.125 2024-08-20 12:53:52,771 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 15 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 12:53:53,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4809480.0, ans=0.0 2024-08-20 12:54:32,335 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6800, loss[loss=0.1071, beats_loss=0.01014, ecapa_loss=0.0001416, whisper_loss=0.09551, over 19817.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001417, whisper_loss=0.09068, over 3886907.96 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:54:33,522 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:54:55,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4809880.0, ans=0.0 2024-08-20 12:55:01,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4809880.0, ans=0.0 2024-08-20 12:55:07,237 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 12:55:09,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4809980.0, ans=0.125 2024-08-20 12:55:28,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4810080.0, ans=0.125 2024-08-20 12:55:29,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4810080.0, ans=0.1 2024-08-20 12:55:36,676 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4810080.0, ans=0.125 2024-08-20 12:55:59,935 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6850, loss[loss=0.1212, beats_loss=0.008362, ecapa_loss=0.0001454, whisper_loss=0.1114, over 21017.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001424, whisper_loss=0.09058, over 3851915.88 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:56:13,960 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 12:56:21,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4810380.0, ans=0.125 2024-08-20 12:56:26,551 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-08-20 12:56:28,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-20 12:56:28,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.376e+01 2.516e+01 2.843e+01 1.582e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 12:56:32,900 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-20 12:56:59,041 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4810580.0, ans=0.2 2024-08-20 12:57:00,133 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 12:57:16,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4810680.0, ans=0.0 2024-08-20 12:57:18,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4810680.0, ans=0.1 2024-08-20 12:57:25,121 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 12:57:30,360 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6900, loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001429, whisper_loss=0.09069, over 22230.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001414, whisper_loss=0.09042, over 3825521.50 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:57:34,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4810780.0, ans=0.125 2024-08-20 12:57:49,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4810880.0, ans=0.125 2024-08-20 12:57:52,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4810880.0, ans=0.0 2024-08-20 12:57:53,628 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 12:57:58,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4810880.0, ans=0.05 2024-08-20 12:58:01,500 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 12:58:03,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4810880.0, ans=0.125 2024-08-20 12:58:08,882 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4810980.0, ans=0.0 2024-08-20 12:58:10,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4810980.0, ans=10.0 2024-08-20 12:58:32,077 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-08-20 12:58:32,613 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 12:58:41,603 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 12:58:41,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4811180.0, ans=0.125 2024-08-20 12:58:52,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4811180.0, ans=0.0 2024-08-20 12:58:59,269 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 6950, loss[loss=0.0968, beats_loss=0.01059, ecapa_loss=0.0001265, whisper_loss=0.08494, over 14866.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.00014, whisper_loss=0.09024, over 3842480.07 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:59:01,375 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 22 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-20 12:59:08,271 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 29 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 12:59:22,320 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 12:59:30,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.235e+01 2.363e+01 2.831e+01 5.596e+01, threshold=4.726e+01, percent-clipped=1.0 2024-08-20 12:59:36,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4811480.0, ans=0.125 2024-08-20 12:59:38,221 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 12:59:55,437 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 26 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 13:00:05,897 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 13:00:28,314 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 13:00:29,824 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7000, loss[loss=0.112, beats_loss=0.009884, ecapa_loss=0.0001522, whisper_loss=0.1006, over 20424.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.08995, over 3836415.98 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:00:32,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4811780.0, ans=0.125 2024-08-20 13:00:44,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4811780.0, ans=0.0 2024-08-20 13:00:48,074 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 13:00:59,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4811880.0, ans=0.2 2024-08-20 13:01:03,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4811880.0, ans=0.125 2024-08-20 13:01:08,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4811980.0, ans=0.125 2024-08-20 13:01:42,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4812180.0, ans=0.125 2024-08-20 13:01:59,332 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7050, loss[loss=0.09488, beats_loss=0.009934, ecapa_loss=0.0001103, whisper_loss=0.08384, over 15152.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001401, whisper_loss=0.09042, over 3832263.38 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:02:22,272 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 13:02:28,958 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 13:02:30,574 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 13:02:31,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.241e+01 2.463e+01 2.779e+01 3.668e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-20 13:02:32,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4812380.0, ans=0.0 2024-08-20 13:02:42,416 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 13:03:01,637 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4812580.0, ans=0.1 2024-08-20 13:03:18,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4812680.0, ans=0.0 2024-08-20 13:03:31,197 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7100, loss[loss=0.1135, beats_loss=0.009737, ecapa_loss=0.0001532, whisper_loss=0.1022, over 21499.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001397, whisper_loss=0.09017, over 3830676.97 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:03:33,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4812780.0, ans=0.1 2024-08-20 13:03:51,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4812880.0, ans=0.0 2024-08-20 13:04:18,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4812980.0, ans=0.125 2024-08-20 13:04:45,726 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 18 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 13:04:46,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4813180.0, ans=0.125 2024-08-20 13:04:59,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4813180.0, ans=0.1 2024-08-20 13:05:03,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4813280.0, ans=0.125 2024-08-20 13:05:04,630 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7150, loss[loss=0.1159, beats_loss=0.006393, ecapa_loss=0.0001736, whisper_loss=0.1077, over 15685.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.08968, over 3818862.74 frames. ], batch size: 56, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:05:13,843 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 13:05:21,231 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4813380.0, ans=0.125 2024-08-20 13:05:36,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.222e+01 2.476e+01 2.874e+01 3.378e+02, threshold=4.952e+01, percent-clipped=1.0 2024-08-20 13:05:49,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4813480.0, ans=0.07 2024-08-20 13:06:13,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4813580.0, ans=0.125 2024-08-20 13:06:16,365 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 13:06:19,826 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 13:06:22,484 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.04 vs. limit=22.5 2024-08-20 13:06:24,264 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-20 13:06:36,017 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7200, loss[loss=0.09618, beats_loss=0.01181, ecapa_loss=0.0001636, whisper_loss=0.08273, over 20148.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001386, whisper_loss=0.08978, over 3799810.21 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:06:37,556 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 13:06:55,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4813880.0, ans=0.125 2024-08-20 13:06:55,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-20 13:07:06,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4813880.0, ans=10.0 2024-08-20 13:07:10,556 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 13:07:16,137 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 13:07:28,325 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-20 13:07:39,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4814080.0, ans=0.125 2024-08-20 13:07:43,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4814080.0, ans=0.125 2024-08-20 13:07:56,318 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09890901297330856, model_norm_threshold=49.522193908691406 2024-08-20 13:07:56,472 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.024e+04, grad_sumsq=9.178e+03, orig_rms_sq=3.294e+00 2024-08-20 13:08:03,403 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 13:08:07,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4814180.0, ans=0.125 2024-08-20 13:08:09,593 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 13:08:09,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4814280.0, ans=0.125 2024-08-20 13:08:10,559 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7250, loss[loss=0.1177, beats_loss=0.008742, ecapa_loss=0.0001207, whisper_loss=0.1078, over 23807.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001396, whisper_loss=0.09039, over 3797621.21 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:08:32,995 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 13:08:41,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.256e+01 2.635e+01 2.926e+01 5.007e+02, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 13:09:17,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4814580.0, ans=0.125 2024-08-20 13:09:18,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4814580.0, ans=0.0 2024-08-20 13:09:20,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-20 13:09:27,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4814680.0, ans=0.125 2024-08-20 13:09:39,683 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7300, loss[loss=0.1071, beats_loss=0.008936, ecapa_loss=0.0001453, whisper_loss=0.09667, over 17159.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.09054, over 3801762.86 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:09:52,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-20 13:09:59,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4814880.0, ans=0.125 2024-08-20 13:10:01,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4814880.0, ans=0.07 2024-08-20 13:10:11,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4814880.0, ans=0.2 2024-08-20 13:10:13,452 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 13:10:37,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4815080.0, ans=0.1 2024-08-20 13:11:01,648 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 13:11:02,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-20 13:11:08,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4815180.0, ans=0.125 2024-08-20 13:11:11,673 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 13:11:15,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4815280.0, ans=0.125 2024-08-20 13:11:17,552 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7350, loss[loss=0.086, beats_loss=0.01, ecapa_loss=0.0001636, whisper_loss=0.07436, over 20708.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01028, ecapa_loss=0.0001422, whisper_loss=0.09119, over 3813624.95 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:11:28,008 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 16 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 13:11:30,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4815280.0, ans=0.2 2024-08-20 13:11:51,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-20 13:11:52,796 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.308e+01 2.500e+01 2.769e+01 3.790e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-20 13:12:05,619 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-20 13:12:36,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4815580.0, ans=0.125 2024-08-20 13:12:41,970 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4815680.0, ans=0.1 2024-08-20 13:12:50,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4815680.0, ans=0.125 2024-08-20 13:13:01,529 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7400, loss[loss=0.08101, beats_loss=0.01086, ecapa_loss=0.0001357, whisper_loss=0.0688, over 19039.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01031, ecapa_loss=0.0001417, whisper_loss=0.09058, over 3826688.44 frames. ], batch size: 79, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:13:01,752 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 13:13:05,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4815780.0, ans=0.025 2024-08-20 13:13:19,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4815780.0, ans=0.125 2024-08-20 13:14:02,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4816080.0, ans=0.015 2024-08-20 13:14:02,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4816080.0, ans=0.1 2024-08-20 13:14:22,161 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 13:14:31,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4816180.0, ans=0.125 2024-08-20 13:14:38,171 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7450, loss[loss=0.09237, beats_loss=0.0123, ecapa_loss=0.0001202, whisper_loss=0.07887, over 23068.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01026, ecapa_loss=0.0001424, whisper_loss=0.09075, over 3803392.02 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:14:38,353 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 16 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 13:14:54,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4816280.0, ans=0.125 2024-08-20 13:15:00,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4816380.0, ans=0.0 2024-08-20 13:15:11,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.199e+01 2.442e+01 2.667e+01 5.088e+01, threshold=4.883e+01, percent-clipped=1.0 2024-08-20 13:15:30,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4816480.0, ans=0.2 2024-08-20 13:15:40,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4816580.0, ans=0.1 2024-08-20 13:15:46,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4816580.0, ans=0.04949747468305833 2024-08-20 13:16:02,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4816680.0, ans=0.125 2024-08-20 13:16:12,057 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 13:16:12,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4816680.0, ans=0.125 2024-08-20 13:16:19,359 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7500, loss[loss=0.1202, beats_loss=0.007176, ecapa_loss=0.0001024, whisper_loss=0.112, over 18462.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.000141, whisper_loss=0.08959, over 3823398.63 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:16:34,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4816780.0, ans=0.2 2024-08-20 13:16:35,060 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-20 13:17:22,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4817080.0, ans=0.125 2024-08-20 13:17:32,413 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 13:17:36,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4817080.0, ans=0.1 2024-08-20 13:17:40,856 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-20 13:17:57,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4817180.0, ans=0.0 2024-08-20 13:18:01,084 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7550, loss[loss=0.07584, beats_loss=0.01408, ecapa_loss=0.000109, whisper_loss=0.06067, over 18980.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001408, whisper_loss=0.08921, over 3809835.52 frames. ], batch size: 79, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:18:12,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-08-20 13:18:17,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4817280.0, ans=0.125 2024-08-20 13:18:34,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.221e+01 2.519e+01 2.793e+01 1.462e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-20 13:18:56,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4817480.0, ans=0.0 2024-08-20 13:19:00,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4817580.0, ans=0.125 2024-08-20 13:19:19,411 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-20 13:19:38,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4817680.0, ans=0.125 2024-08-20 13:19:42,578 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7600, loss[loss=0.1203, beats_loss=0.007738, ecapa_loss=0.0001608, whisper_loss=0.1109, over 19885.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.08996, over 3812292.59 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:20:05,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4817880.0, ans=0.125 2024-08-20 13:20:20,424 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4817980.0, ans=0.025 2024-08-20 13:20:20,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4817980.0, ans=0.125 2024-08-20 13:20:21,994 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 13:20:24,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4817980.0, ans=0.0 2024-08-20 13:20:26,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4817980.0, ans=0.1 2024-08-20 13:20:32,848 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 13:20:35,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4817980.0, ans=0.05 2024-08-20 13:20:45,451 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.354e-01 2024-08-20 13:20:47,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4818080.0, ans=0.1 2024-08-20 13:20:52,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.44 vs. limit=10.0 2024-08-20 13:20:59,270 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2024-08-20 13:21:00,697 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4818180.0, ans=0.125 2024-08-20 13:21:00,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4818180.0, ans=0.05 2024-08-20 13:21:04,773 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 13:21:19,811 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7650, loss[loss=0.09752, beats_loss=0.008602, ecapa_loss=0.0001231, whisper_loss=0.08769, over 16669.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001413, whisper_loss=0.09056, over 3851550.81 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:21:22,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4818280.0, ans=0.125 2024-08-20 13:21:24,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-20 13:21:43,221 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 13:21:52,255 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.373e+01 2.628e+01 2.994e+01 5.178e+01, threshold=5.256e+01, percent-clipped=1.0 2024-08-20 13:22:14,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4818480.0, ans=0.125 2024-08-20 13:22:38,195 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-20 13:22:42,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4818680.0, ans=0.125 2024-08-20 13:22:57,174 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7700, loss[loss=0.1004, beats_loss=0.01031, ecapa_loss=0.0001403, whisper_loss=0.0887, over 23931.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.08998, over 3856905.11 frames. ], batch size: 97, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:23:08,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4818780.0, ans=0.125 2024-08-20 13:23:11,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4818780.0, ans=0.125 2024-08-20 13:23:25,474 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 40 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 13:23:27,868 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=12.0 2024-08-20 13:23:42,048 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 13:23:42,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.14 vs. limit=10.0 2024-08-20 13:23:54,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4819080.0, ans=0.1 2024-08-20 13:24:15,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-20 13:24:39,016 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7750, loss[loss=0.1142, beats_loss=0.008787, ecapa_loss=0.0001158, whisper_loss=0.1043, over 19609.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001412, whisper_loss=0.09017, over 3863876.78 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:24:47,322 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.371e-02 2024-08-20 13:24:54,009 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 22 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-20 13:25:13,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4819380.0, ans=0.2 2024-08-20 13:25:16,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.228e+01 2.412e+01 2.689e+01 6.555e+01, threshold=4.823e+01, percent-clipped=1.0 2024-08-20 13:25:17,878 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-20 13:25:27,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4819480.0, ans=0.125 2024-08-20 13:25:53,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4819580.0, ans=0.125 2024-08-20 13:26:17,568 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7800, loss[loss=0.1011, beats_loss=0.0124, ecapa_loss=0.0001394, whisper_loss=0.08734, over 22775.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.0904, over 3829879.51 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:26:21,428 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2024-08-20 13:26:38,545 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 13:26:49,034 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-20 13:26:56,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2024-08-20 13:27:03,962 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 31 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 13:27:04,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4819980.0, ans=0.0 2024-08-20 13:27:04,703 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-20 13:27:20,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4820080.0, ans=0.125 2024-08-20 13:27:32,210 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-20 13:27:42,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4820180.0, ans=0.125 2024-08-20 13:27:56,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4820280.0, ans=0.1 2024-08-20 13:27:58,223 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7850, loss[loss=0.08564, beats_loss=0.0114, ecapa_loss=0.0001431, whisper_loss=0.07281, over 14181.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001409, whisper_loss=0.09009, over 3860002.48 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:27:58,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-20 13:28:00,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4820280.0, ans=0.125 2024-08-20 13:28:04,110 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 13:28:34,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.297e+01 2.493e+01 2.759e+01 4.989e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-20 13:28:56,109 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 13:29:09,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4820580.0, ans=0.125 2024-08-20 13:29:13,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=12.0 2024-08-20 13:29:15,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4820580.0, ans=0.2 2024-08-20 13:29:22,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4820680.0, ans=0.125 2024-08-20 13:29:25,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4820680.0, ans=0.1 2024-08-20 13:29:36,612 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-20 13:29:40,366 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7900, loss[loss=0.1086, beats_loss=0.01033, ecapa_loss=0.0001369, whisper_loss=0.09687, over 23806.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.0897, over 3843814.87 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:29:57,059 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 13:30:05,830 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 11 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 13:30:11,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4820880.0, ans=0.0 2024-08-20 13:30:15,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4820880.0, ans=0.125 2024-08-20 13:31:18,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4821280.0, ans=0.0 2024-08-20 13:31:18,645 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2024-08-20 13:31:19,932 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 7950, loss[loss=0.1168, beats_loss=0.008347, ecapa_loss=0.000151, whisper_loss=0.1069, over 22672.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.08914, over 3832202.50 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:31:26,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4821280.0, ans=0.125 2024-08-20 13:31:41,357 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 13:31:56,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.297e+01 2.495e+01 2.761e+01 3.642e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 13:32:05,545 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 13:32:09,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4821480.0, ans=0.025 2024-08-20 13:32:25,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2024-08-20 13:32:42,213 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-20 13:32:55,132 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8000, loss[loss=0.1044, beats_loss=0.009076, ecapa_loss=0.0001601, whisper_loss=0.09369, over 22766.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.08988, over 3840125.67 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:33:00,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4821780.0, ans=0.2 2024-08-20 13:33:02,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=22.5 2024-08-20 13:33:11,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4821780.0, ans=0.125 2024-08-20 13:33:11,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4821780.0, ans=0.2 2024-08-20 13:33:20,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4821880.0, ans=0.1 2024-08-20 13:33:30,181 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 18 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 13:33:47,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4821980.0, ans=0.125 2024-08-20 13:33:53,114 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 13:33:59,247 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 21 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 13:34:25,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4822180.0, ans=0.125 2024-08-20 13:34:32,034 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8050, loss[loss=0.09852, beats_loss=0.01, ecapa_loss=0.0001382, whisper_loss=0.08713, over 18171.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.0001398, whisper_loss=0.08955, over 3818822.72 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:34:54,800 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 13:34:56,617 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 13:35:07,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.231e+01 2.467e+01 2.722e+01 2.720e+02, threshold=4.934e+01, percent-clipped=1.0 2024-08-20 13:35:11,132 WARNING [optim.py:496] (1/4) Scaling gradients by 0.020375000312924385, model_norm_threshold=49.342281341552734 2024-08-20 13:35:11,288 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.714e+05, grad_sumsq=7.714e+05, orig_rms_sq=1.000e+00 2024-08-20 13:35:19,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4822480.0, ans=0.1 2024-08-20 13:35:31,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4822580.0, ans=0.1 2024-08-20 13:35:31,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4822580.0, ans=0.125 2024-08-20 13:35:59,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4822680.0, ans=0.1 2024-08-20 13:36:03,173 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 13:36:10,112 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8100, loss[loss=0.1188, beats_loss=0.01004, ecapa_loss=0.0001386, whisper_loss=0.1074, over 20087.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08993, over 3845552.38 frames. ], batch size: 79, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:36:12,876 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 13:36:32,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4822880.0, ans=0.0 2024-08-20 13:36:52,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4822980.0, ans=0.2 2024-08-20 13:36:55,898 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 39 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-20 13:37:06,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4822980.0, ans=0.125 2024-08-20 13:37:12,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4823080.0, ans=0.125 2024-08-20 13:37:43,935 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 13:37:50,172 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8150, loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001249, whisper_loss=0.0928, over 22981.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001391, whisper_loss=0.09044, over 3851128.16 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:37:56,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2024-08-20 13:38:01,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4823280.0, ans=0.125 2024-08-20 13:38:05,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-20 13:38:23,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.223e+01 2.514e+01 2.822e+01 2.422e+03, threshold=5.028e+01, percent-clipped=2.0 2024-08-20 13:38:33,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4823480.0, ans=0.125 2024-08-20 13:38:34,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4823480.0, ans=0.1 2024-08-20 13:39:07,332 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 13:39:24,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4823780.0, ans=0.0 2024-08-20 13:39:26,036 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8200, loss[loss=0.1145, beats_loss=0.008704, ecapa_loss=0.0001158, whisper_loss=0.1046, over 16147.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001385, whisper_loss=0.08963, over 3812778.26 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:39:26,269 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 13:39:31,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-20 13:39:33,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2024-08-20 13:40:08,145 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 13:40:33,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4824080.0, ans=0.07 2024-08-20 13:40:43,150 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 13:41:05,389 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8250, loss[loss=0.105, beats_loss=0.01135, ecapa_loss=0.0001114, whisper_loss=0.09251, over 23499.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001383, whisper_loss=0.08992, over 3839764.56 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:41:33,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4824380.0, ans=10.0 2024-08-20 13:41:35,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4824380.0, ans=0.125 2024-08-20 13:41:40,704 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.279e+01 2.520e+01 2.852e+01 4.224e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-20 13:42:08,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4824580.0, ans=0.0 2024-08-20 13:42:35,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4824680.0, ans=0.0 2024-08-20 13:42:44,654 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8300, loss[loss=0.1076, beats_loss=0.01062, ecapa_loss=0.0001443, whisper_loss=0.09557, over 20992.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001393, whisper_loss=0.09067, over 3819738.26 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:42:44,896 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 13:42:57,839 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 13:43:22,550 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 13:43:51,814 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-20 13:43:51,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2024-08-20 13:44:08,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4825180.0, ans=0.125 2024-08-20 13:44:09,767 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 13:44:11,604 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 26 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 13:44:23,241 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8350, loss[loss=0.1046, beats_loss=0.01168, ecapa_loss=0.0001213, whisper_loss=0.09168, over 22415.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.09035, over 3819261.75 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:44:27,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4825280.0, ans=0.2 2024-08-20 13:44:29,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4825280.0, ans=0.0 2024-08-20 13:44:36,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 2024-08-20 13:44:42,503 WARNING [optim.py:496] (1/4) Scaling gradients by 0.025893952697515488, model_norm_threshold=50.39912796020508 2024-08-20 13:44:42,658 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.172e+05, grad_sumsq=8.560e+07, orig_rms_sq=1.071e-02 2024-08-20 13:44:58,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.380e+01 2.657e+01 2.993e+01 1.946e+03, threshold=5.314e+01, percent-clipped=1.0 2024-08-20 13:45:03,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=12.0 2024-08-20 13:45:06,322 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 8 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 13:45:06,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4825480.0, ans=0.1 2024-08-20 13:45:14,746 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 37 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 13:45:46,276 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 33 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-20 13:46:02,258 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8400, loss[loss=0.1323, beats_loss=0.00778, ecapa_loss=0.0001957, whisper_loss=0.1226, over 16159.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001395, whisper_loss=0.09115, over 3824020.73 frames. ], batch size: 65, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:46:17,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4825780.0, ans=0.0 2024-08-20 13:46:41,297 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 18 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-20 13:46:43,447 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4825980.0, ans=0.2 2024-08-20 13:46:46,904 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 13:47:24,312 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-20 13:47:35,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4826180.0, ans=0.125 2024-08-20 13:47:40,624 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8450, loss[loss=0.1022, beats_loss=0.008386, ecapa_loss=0.0001586, whisper_loss=0.09221, over 22780.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.08993, over 3815241.56 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:47:48,879 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 13:47:51,038 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 13:48:14,165 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-20 13:48:15,372 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.337e+01 2.560e+01 2.840e+01 5.771e+01, threshold=5.121e+01, percent-clipped=2.0 2024-08-20 13:48:19,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4826480.0, ans=0.05 2024-08-20 13:48:48,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4826580.0, ans=0.125 2024-08-20 13:49:06,595 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-20 13:49:10,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4826680.0, ans=0.0 2024-08-20 13:49:17,567 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8500, loss[loss=0.109, beats_loss=0.007376, ecapa_loss=0.0001442, whisper_loss=0.1002, over 16258.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001395, whisper_loss=0.08947, over 3808604.00 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:49:26,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4826780.0, ans=0.125 2024-08-20 13:49:44,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4826880.0, ans=0.04949747468305833 2024-08-20 13:49:55,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-20 13:50:05,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4826980.0, ans=0.125 2024-08-20 13:50:23,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4827080.0, ans=0.07 2024-08-20 13:50:42,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.48 vs. limit=10.0 2024-08-20 13:50:43,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4827180.0, ans=0.125 2024-08-20 13:50:45,447 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 28 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 13:50:46,426 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8550, loss[loss=0.1053, beats_loss=0.01114, ecapa_loss=0.0001078, whisper_loss=0.0931, over 21829.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.08971, over 3814230.01 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:50:52,904 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 24 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-20 13:51:05,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-20 13:51:12,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4827380.0, ans=0.5 2024-08-20 13:51:13,273 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-20 13:51:20,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.327e+01 2.509e+01 2.689e+01 1.250e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 13:51:20,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4827380.0, ans=0.0 2024-08-20 13:51:27,697 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 13:51:39,029 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 13:51:43,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4827580.0, ans=0.2 2024-08-20 13:51:50,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4827580.0, ans=0.0 2024-08-20 13:51:50,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4827580.0, ans=0.09899494936611666 2024-08-20 13:51:57,326 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 21 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 13:52:16,556 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 37 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 13:52:18,411 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8600, loss[loss=0.1163, beats_loss=0.01098, ecapa_loss=0.0001317, whisper_loss=0.104, over 23927.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001401, whisper_loss=0.08998, over 3807923.46 frames. ], batch size: 96, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:52:22,218 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 13:52:25,388 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 13:52:36,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4827880.0, ans=0.0 2024-08-20 13:52:36,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4827880.0, ans=0.125 2024-08-20 13:53:04,836 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2024-08-20 13:53:22,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4828080.0, ans=0.1 2024-08-20 13:53:47,616 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8650, loss[loss=0.1134, beats_loss=0.01101, ecapa_loss=0.0001139, whisper_loss=0.1012, over 24290.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.0892, over 3805696.48 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:54:05,541 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 13:54:07,994 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.29 vs. limit=22.5 2024-08-20 13:54:16,602 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 13:54:17,305 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-08-20 13:54:21,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.355e+01 2.590e+01 2.852e+01 2.640e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-20 13:54:57,943 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4828580.0, ans=0.125 2024-08-20 13:55:21,895 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8700, loss[loss=0.1201, beats_loss=0.008469, ecapa_loss=0.0001223, whisper_loss=0.1104, over 16226.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.08941, over 3792970.03 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:55:42,177 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4828880.0, ans=0.2 2024-08-20 13:55:48,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4828880.0, ans=0.1 2024-08-20 13:55:55,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4828880.0, ans=0.125 2024-08-20 13:56:32,443 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 13:56:51,273 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 13:56:53,947 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8750, loss[loss=0.1164, beats_loss=0.009371, ecapa_loss=0.000117, whisper_loss=0.1059, over 23290.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.09022, over 3800581.48 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:57:00,883 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:57:09,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4829280.0, ans=0.125 2024-08-20 13:57:20,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4829380.0, ans=0.125 2024-08-20 13:57:24,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4829380.0, ans=0.09899494936611666 2024-08-20 13:57:26,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4829380.0, ans=0.0 2024-08-20 13:57:28,569 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-20 13:57:29,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.524e+01 2.778e+01 9.782e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 13:57:42,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4829480.0, ans=0.125 2024-08-20 13:57:52,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4829580.0, ans=0.2 2024-08-20 13:58:02,907 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 13:58:07,830 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 13:58:21,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4829680.0, ans=0.2 2024-08-20 13:58:23,110 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 13:58:24,232 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8800, loss[loss=0.08798, beats_loss=0.00747, ecapa_loss=0.0001567, whisper_loss=0.07894, over 19070.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01023, ecapa_loss=0.0001394, whisper_loss=0.09062, over 3773003.57 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:59:08,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4829980.0, ans=0.0 2024-08-20 13:59:29,346 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 13:59:32,134 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-08-20 13:59:55,618 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8850, loss[loss=0.1063, beats_loss=0.01052, ecapa_loss=0.0001418, whisper_loss=0.09434, over 19587.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.08943, over 3769357.33 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:00:00,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4830280.0, ans=10.0 2024-08-20 14:00:02,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4830280.0, ans=0.2 2024-08-20 14:00:10,829 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-20 14:00:30,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.235e+01 2.490e+01 2.757e+01 4.655e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-20 14:00:41,473 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 14:01:10,159 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 14:01:10,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4830680.0, ans=0.2 2024-08-20 14:01:26,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=12.0 2024-08-20 14:01:29,250 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8900, loss[loss=0.1079, beats_loss=0.01054, ecapa_loss=0.0001511, whisper_loss=0.09582, over 22822.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.08947, over 3756014.07 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:01:34,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4830780.0, ans=0.125 2024-08-20 14:01:48,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.00 vs. limit=10.0 2024-08-20 14:01:51,869 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 14:01:52,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4830880.0, ans=0.0 2024-08-20 14:02:16,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4830980.0, ans=0.04949747468305833 2024-08-20 14:02:35,496 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:02:55,088 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 14:02:59,819 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 8950, loss[loss=0.1039, beats_loss=0.01172, ecapa_loss=0.0001347, whisper_loss=0.09081, over 20867.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.0892, over 3787002.82 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:03:21,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4831380.0, ans=0.0 2024-08-20 14:03:30,485 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.318e+01 2.492e+01 2.834e+01 3.721e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 14:03:45,453 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 12 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 14:04:21,933 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 14:04:23,569 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 14:04:26,524 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9000, loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.0001568, whisper_loss=0.08862, over 20344.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.08941, over 3765157.04 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:04:26,524 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 14:05:08,442 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0219, 1.3087, 1.8825, 0.9366, 1.0602, 1.6017, 1.8697, 1.8648], device='cuda:1') 2024-08-20 14:05:10,580 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005032, whisper_loss=0.2493, over 931116.00 frames. 2024-08-20 14:05:34,672 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on SV_voxceleb1: loss=0.003984, beats_loss=0, ecapa_loss=0.0003984, whisper_loss=0, over 944235.00 frames. 2024-08-20 14:07:37,293 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 14:07:37,297 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 14:07:54,811 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 14:07:58,762 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-20 14:08:07,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4831880.0, ans=0.125 2024-08-20 14:08:08,737 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 14:08:10,616 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 14:08:25,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4831980.0, ans=0.125 2024-08-20 14:08:25,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4831980.0, ans=0.0 2024-08-20 14:08:59,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4832280.0, ans=0.0 2024-08-20 14:09:00,243 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9050, loss[loss=0.122, beats_loss=0.009129, ecapa_loss=0.0001207, whisper_loss=0.1117, over 14590.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08861, over 3756521.15 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:09:06,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=4832280.0, ans=12.0 2024-08-20 14:09:06,903 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 14:09:13,837 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 14:09:20,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4832380.0, ans=0.0 2024-08-20 14:09:23,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4832380.0, ans=0.125 2024-08-20 14:09:29,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.212e+01 2.355e+01 2.625e+01 3.620e+01, threshold=4.711e+01, percent-clipped=0.0 2024-08-20 14:09:30,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2024-08-20 14:09:35,543 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2024-08-20 14:09:40,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4832480.0, ans=0.1 2024-08-20 14:09:47,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-20 14:09:48,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4832580.0, ans=0.0 2024-08-20 14:09:57,457 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 14:09:59,158 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 38 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 14:10:07,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4832680.0, ans=0.1 2024-08-20 14:10:11,343 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 12 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 14:10:23,265 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 14:10:24,906 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9100, loss[loss=0.07825, beats_loss=0.01256, ecapa_loss=0.0001372, whisper_loss=0.06433, over 16723.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08942, over 3768531.25 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:11:03,932 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 14:11:04,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-20 14:11:10,936 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 14:11:36,409 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 14:11:41,699 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 14:11:41,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4833180.0, ans=0.125 2024-08-20 14:11:43,802 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4833180.0, ans=0.0 2024-08-20 14:11:46,694 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 14:11:52,708 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9150, loss[loss=0.1208, beats_loss=0.009363, ecapa_loss=0.0001196, whisper_loss=0.1102, over 16793.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001401, whisper_loss=0.09005, over 3804654.23 frames. ], batch size: 63, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:11:57,139 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 14:12:07,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4833380.0, ans=0.125 2024-08-20 14:12:14,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4833380.0, ans=0.0 2024-08-20 14:12:22,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.265e+01 2.494e+01 2.821e+01 1.323e+02, threshold=4.988e+01, percent-clipped=2.0 2024-08-20 14:12:27,743 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-08-20 14:12:39,019 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.41 vs. limit=22.5 2024-08-20 14:13:09,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4833680.0, ans=0.0 2024-08-20 14:13:19,902 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9200, loss[loss=0.1241, beats_loss=0.008167, ecapa_loss=0.0001436, whisper_loss=0.1145, over 14261.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01029, ecapa_loss=0.0001402, whisper_loss=0.09074, over 3797842.33 frames. ], batch size: 56, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:13:22,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-20 14:13:57,723 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 23 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-20 14:14:17,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4834080.0, ans=0.07 2024-08-20 14:14:43,113 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 14:14:46,038 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9250, loss[loss=0.1061, beats_loss=0.01187, ecapa_loss=0.0001138, whisper_loss=0.09313, over 22669.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.09009, over 3742597.50 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:14:58,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4834280.0, ans=0.125 2024-08-20 14:14:59,564 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=22.5 2024-08-20 14:15:16,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.272e+01 2.596e+01 3.076e+01 4.662e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-20 14:15:17,941 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 14:15:21,642 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 14:15:33,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4834480.0, ans=0.0 2024-08-20 14:16:13,253 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9300, loss[loss=0.08744, beats_loss=0.01272, ecapa_loss=9.973e-05, whisper_loss=0.07372, over 22657.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001385, whisper_loss=0.09063, over 3791603.22 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:16:14,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4834780.0, ans=0.0 2024-08-20 14:16:31,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-20 14:16:39,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4834880.0, ans=0.0 2024-08-20 14:16:39,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4834880.0, ans=0.025 2024-08-20 14:17:07,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4835080.0, ans=0.1 2024-08-20 14:17:32,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4835180.0, ans=0.0 2024-08-20 14:17:35,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4835180.0, ans=0.0 2024-08-20 14:17:41,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4835180.0, ans=0.1 2024-08-20 14:17:44,341 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9350, loss[loss=0.1019, beats_loss=0.01156, ecapa_loss=0.0001115, whisper_loss=0.08921, over 19540.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.09042, over 3784494.62 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:17:50,046 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 14:17:51,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4835280.0, ans=0.125 2024-08-20 14:17:55,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4835280.0, ans=0.015 2024-08-20 14:18:00,215 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 14:18:03,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4835380.0, ans=0.125 2024-08-20 14:18:17,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.510e+01 2.725e+01 8.699e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 14:18:19,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4835480.0, ans=0.5 2024-08-20 14:18:22,148 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-20 14:18:57,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4835680.0, ans=0.0 2024-08-20 14:19:16,739 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9400, loss[loss=0.1055, beats_loss=0.01086, ecapa_loss=0.0001483, whisper_loss=0.0932, over 21419.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001398, whisper_loss=0.09034, over 3792500.23 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:19:17,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4835780.0, ans=0.035 2024-08-20 14:19:25,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4835780.0, ans=0.2 2024-08-20 14:19:28,572 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 14:19:33,953 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 14:19:48,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4835880.0, ans=0.125 2024-08-20 14:19:56,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4835980.0, ans=0.015 2024-08-20 14:20:04,864 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 14:20:08,106 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 14:20:30,576 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=15.0 2024-08-20 14:20:47,416 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9450, loss[loss=0.1369, beats_loss=0.008111, ecapa_loss=0.000134, whisper_loss=0.1275, over 23815.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.09029, over 3805861.00 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:20:52,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4836280.0, ans=0.125 2024-08-20 14:21:09,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-08-20 14:21:11,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4836380.0, ans=0.0 2024-08-20 14:21:20,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.301e+01 2.582e+01 2.889e+01 4.439e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-20 14:21:37,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4836480.0, ans=0.0 2024-08-20 14:22:20,394 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 15 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-20 14:22:21,456 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9500, loss[loss=0.08778, beats_loss=0.01224, ecapa_loss=9.265e-05, whisper_loss=0.07462, over 15826.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08985, over 3789987.57 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:22:43,045 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4836880.0, ans=0.0 2024-08-20 14:23:21,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4837080.0, ans=0.125 2024-08-20 14:23:49,517 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9550, loss[loss=0.1089, beats_loss=0.008439, ecapa_loss=0.0001838, whisper_loss=0.09859, over 17469.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.09023, over 3785265.03 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:23:49,862 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4837280.0, ans=0.0 2024-08-20 14:23:49,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4837280.0, ans=0.125 2024-08-20 14:23:51,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4837280.0, ans=0.125 2024-08-20 14:24:18,261 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 17 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 14:24:21,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.249e+01 2.469e+01 2.805e+01 3.890e+01, threshold=4.937e+01, percent-clipped=0.0 2024-08-20 14:24:23,299 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 14:24:24,204 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:24:29,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-20 14:24:48,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4837580.0, ans=0.0 2024-08-20 14:24:57,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-20 14:25:10,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4837680.0, ans=0.125 2024-08-20 14:25:19,322 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9600, loss[loss=0.0939, beats_loss=0.01097, ecapa_loss=0.0001274, whisper_loss=0.08166, over 14913.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001392, whisper_loss=0.08941, over 3768397.04 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:25:24,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4837780.0, ans=0.125 2024-08-20 14:25:36,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2024-08-20 14:25:54,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4837980.0, ans=0.125 2024-08-20 14:26:13,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4838080.0, ans=0.05 2024-08-20 14:26:28,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4838180.0, ans=0.2 2024-08-20 14:26:43,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4838180.0, ans=0.125 2024-08-20 14:26:48,009 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9650, loss[loss=0.1108, beats_loss=0.009702, ecapa_loss=0.000149, whisper_loss=0.09964, over 18842.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.09063, over 3783376.08 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:26:54,327 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.579e-03 2024-08-20 14:26:58,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4838280.0, ans=0.0 2024-08-20 14:27:02,994 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 27 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 14:27:04,797 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 14:27:19,274 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 14:27:20,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.272e+01 2.458e+01 2.754e+01 3.251e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-20 14:27:44,669 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-20 14:27:51,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4838580.0, ans=0.1 2024-08-20 14:28:08,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4838680.0, ans=0.05 2024-08-20 14:28:18,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4838680.0, ans=0.2 2024-08-20 14:28:21,302 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9700, loss[loss=0.09124, beats_loss=0.01041, ecapa_loss=0.0002097, whisper_loss=0.07873, over 12179.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001411, whisper_loss=0.09009, over 3796256.56 frames. ], batch size: 50, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:28:34,435 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 14:29:08,615 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 28 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-20 14:29:19,969 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 14:29:51,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4839180.0, ans=0.0 2024-08-20 14:29:55,251 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9750, loss[loss=0.1062, beats_loss=0.009474, ecapa_loss=0.000161, whisper_loss=0.09511, over 15911.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.08928, over 3795266.96 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:30:07,541 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 14:30:10,345 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-20 14:30:23,174 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 14:30:23,402 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4839380.0, ans=0.0 2024-08-20 14:30:30,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.577e+01 2.928e+01 5.580e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:30:34,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4839480.0, ans=0.125 2024-08-20 14:30:34,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4839480.0, ans=0.125 2024-08-20 14:30:51,417 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-20 14:30:54,326 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 14:31:05,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4839580.0, ans=0.0 2024-08-20 14:31:16,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4839680.0, ans=0.0 2024-08-20 14:31:28,880 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9800, loss[loss=0.115, beats_loss=0.00934, ecapa_loss=0.0001309, whisper_loss=0.1044, over 22569.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01042, ecapa_loss=0.0001394, whisper_loss=0.08877, over 3799259.34 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:31:50,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4839880.0, ans=0.0 2024-08-20 14:31:56,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-20 14:32:06,406 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 13 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 14:32:06,616 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4839980.0, ans=0.0 2024-08-20 14:32:43,082 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 14:33:02,335 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 14:33:06,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4840280.0, ans=0.1 2024-08-20 14:33:06,945 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9850, loss[loss=0.1123, beats_loss=0.01033, ecapa_loss=0.0001275, whisper_loss=0.1006, over 20071.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.08939, over 3806204.48 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:33:08,285 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 14:33:42,857 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.298e+01 2.480e+01 2.698e+01 3.610e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 14:34:01,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4840480.0, ans=0.07 2024-08-20 14:34:22,448 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.70 vs. limit=22.5 2024-08-20 14:34:30,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4840680.0, ans=0.2 2024-08-20 14:34:46,787 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9900, loss[loss=0.11, beats_loss=0.00747, ecapa_loss=0.0001267, whisper_loss=0.1013, over 16047.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001384, whisper_loss=0.08916, over 3813761.03 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:34:51,405 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 14:34:51,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4840780.0, ans=0.07 2024-08-20 14:35:04,458 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 14:35:19,759 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 19 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-20 14:35:32,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4840980.0, ans=0.0 2024-08-20 14:35:39,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4840980.0, ans=0.125 2024-08-20 14:35:59,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4841080.0, ans=0.125 2024-08-20 14:36:25,362 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 9950, loss[loss=0.09259, beats_loss=0.01179, ecapa_loss=0.0001286, whisper_loss=0.07951, over 23639.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001392, whisper_loss=0.08922, over 3841050.35 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:36:59,340 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.238e+01 2.460e+01 2.685e+01 1.158e+02, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 14:37:24,888 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-20 14:37:28,380 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 14:37:37,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4841680.0, ans=0.2 2024-08-20 14:37:42,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4841680.0, ans=0.2 2024-08-20 14:37:51,714 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10000, loss[loss=0.09722, beats_loss=0.009614, ecapa_loss=0.000138, whisper_loss=0.08622, over 15620.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001389, whisper_loss=0.08904, over 3805065.58 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:37:54,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4841780.0, ans=0.125 2024-08-20 14:37:57,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4841780.0, ans=0.0 2024-08-20 14:38:07,982 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 14 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 14:38:27,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4841980.0, ans=0.125 2024-08-20 14:38:43,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-20 14:38:50,591 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 14:39:16,275 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 14:39:33,011 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10050, loss[loss=0.1126, beats_loss=0.01115, ecapa_loss=0.0001265, whisper_loss=0.1002, over 22057.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.000139, whisper_loss=0.08944, over 3795841.64 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:40:14,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4842380.0, ans=0.125 2024-08-20 14:40:18,405 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.420e+01 2.635e+01 2.918e+01 2.672e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-20 14:40:38,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4842480.0, ans=0.1 2024-08-20 14:40:40,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4842480.0, ans=0.125 2024-08-20 14:40:40,961 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-20 14:40:57,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4842580.0, ans=0.09899494936611666 2024-08-20 14:41:09,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4842680.0, ans=0.1 2024-08-20 14:41:15,975 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 14:41:33,063 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10100, loss[loss=0.117, beats_loss=0.01033, ecapa_loss=0.0001518, whisper_loss=0.1052, over 20592.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001392, whisper_loss=0.08988, over 3841480.36 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:41:34,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4842780.0, ans=0.125 2024-08-20 14:41:46,141 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 29 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-20 14:42:11,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4842880.0, ans=0.5 2024-08-20 14:42:11,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4842880.0, ans=0.0 2024-08-20 14:42:15,888 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 14:42:30,890 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 14:43:02,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4843080.0, ans=0.125 2024-08-20 14:43:02,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-20 14:43:08,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=12.0 2024-08-20 14:43:13,756 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 14:43:28,195 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10150, loss[loss=0.1227, beats_loss=0.00982, ecapa_loss=0.0001093, whisper_loss=0.1117, over 23199.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.08988, over 3833192.94 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:43:39,189 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 14:43:42,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-20 14:43:45,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4843280.0, ans=0.1 2024-08-20 14:43:59,456 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-08-20 14:44:03,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.366e+01 2.588e+01 2.870e+01 1.184e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-20 14:44:11,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4843480.0, ans=0.0 2024-08-20 14:44:15,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4843480.0, ans=0.09899494936611666 2024-08-20 14:44:32,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4843580.0, ans=0.2 2024-08-20 14:44:55,704 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 18 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 14:44:57,143 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10200, loss[loss=0.07564, beats_loss=0.01373, ecapa_loss=0.0001356, whisper_loss=0.06055, over 19144.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001398, whisper_loss=0.08926, over 3817399.88 frames. ], batch size: 82, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:44:57,365 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 14:44:59,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4843780.0, ans=0.125 2024-08-20 14:45:07,623 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 14:45:10,857 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 14:45:16,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4843880.0, ans=0.125 2024-08-20 14:45:18,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4843880.0, ans=0.2 2024-08-20 14:45:41,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4843980.0, ans=0.125 2024-08-20 14:45:47,726 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 18 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 14:45:50,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2024-08-20 14:46:23,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4844180.0, ans=0.0 2024-08-20 14:46:27,322 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10250, loss[loss=0.1029, beats_loss=0.01159, ecapa_loss=0.0001086, whisper_loss=0.09026, over 20374.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.08987, over 3830743.54 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:46:44,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4844280.0, ans=0.125 2024-08-20 14:47:03,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.274e+01 2.551e+01 2.893e+01 4.019e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-20 14:47:06,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4844480.0, ans=0.125 2024-08-20 14:47:11,540 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 14:47:17,238 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 13 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 14:47:19,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4844480.0, ans=0.125 2024-08-20 14:47:23,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4844580.0, ans=0.0 2024-08-20 14:47:39,167 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4844580.0, ans=0.1 2024-08-20 14:47:54,822 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.45 vs. limit=10.0 2024-08-20 14:48:02,605 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10300, loss[loss=0.09534, beats_loss=0.01018, ecapa_loss=0.0001696, whisper_loss=0.08347, over 18266.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.09019, over 3843121.59 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:49:13,586 WARNING [optim.py:496] (1/4) Scaling gradients by 0.02814776450395584, model_norm_threshold=51.010257720947266 2024-08-20 14:49:13,740 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.746e+05, grad_sumsq=8.746e+05, orig_rms_sq=1.000e+00 2024-08-20 14:49:14,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4845080.0, ans=0.0 2024-08-20 14:49:50,235 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10350, loss[loss=0.1211, beats_loss=0.009458, ecapa_loss=0.000109, whisper_loss=0.1106, over 16748.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001407, whisper_loss=0.09051, over 3859696.60 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:49:55,565 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 14:50:21,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4845380.0, ans=0.1 2024-08-20 14:50:37,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.346e+01 2.546e+01 2.818e+01 1.812e+03, threshold=5.092e+01, percent-clipped=2.0 2024-08-20 14:50:55,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4845480.0, ans=0.025 2024-08-20 14:51:08,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4845580.0, ans=0.1 2024-08-20 14:51:13,245 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4845580.0, ans=0.125 2024-08-20 14:51:36,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4845680.0, ans=0.125 2024-08-20 14:51:54,349 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10400, loss[loss=0.09458, beats_loss=0.01141, ecapa_loss=0.000182, whisper_loss=0.08135, over 14515.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.08995, over 3837279.80 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:52:20,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4845880.0, ans=0.125 2024-08-20 14:52:48,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4845980.0, ans=0.125 2024-08-20 14:53:26,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4846080.0, ans=0.125 2024-08-20 14:53:28,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4846080.0, ans=0.04949747468305833 2024-08-20 14:53:48,710 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 14:53:56,085 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10450, loss[loss=0.1045, beats_loss=0.009541, ecapa_loss=0.0001088, whisper_loss=0.09386, over 16578.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001409, whisper_loss=0.09042, over 3853770.94 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:54:01,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4846280.0, ans=0.1 2024-08-20 14:54:24,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4846380.0, ans=0.0 2024-08-20 14:54:29,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-20 14:54:35,219 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 14:54:42,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.294e+01 2.565e+01 2.796e+01 8.122e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-20 14:54:59,138 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2024-08-20 14:55:03,247 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-20 14:55:03,327 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-20 14:55:08,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4846580.0, ans=0.125 2024-08-20 14:55:24,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4846580.0, ans=0.2 2024-08-20 14:55:25,585 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 22 from LS+wenet, 36 from Vox, 32 fro AS 2024-08-20 14:55:34,815 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 14:55:51,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4846680.0, ans=0.0 2024-08-20 14:55:54,027 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10500, loss[loss=0.09433, beats_loss=0.01011, ecapa_loss=0.0001525, whisper_loss=0.0827, over 17923.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01028, ecapa_loss=0.0001414, whisper_loss=0.09049, over 3843070.08 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:56:22,714 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 14:56:31,889 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 14:56:38,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.59 vs. limit=22.5 2024-08-20 14:57:10,938 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 14:57:21,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4847080.0, ans=0.125 2024-08-20 14:57:24,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-20 14:57:37,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4847180.0, ans=0.125 2024-08-20 14:57:50,333 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10550, loss[loss=0.1019, beats_loss=0.008094, ecapa_loss=0.0001675, whisper_loss=0.09211, over 19868.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.0898, over 3849079.28 frames. ], batch size: 79, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:58:01,018 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 14:58:02,791 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4847280.0, ans=0.04949747468305833 2024-08-20 14:58:19,837 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-20 14:58:30,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4847380.0, ans=0.0 2024-08-20 14:58:35,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.331e+01 2.577e+01 2.959e+01 5.357e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:58:59,107 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 9 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-20 14:59:06,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4847580.0, ans=0.125 2024-08-20 14:59:30,947 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-08-20 14:59:45,940 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10600, loss[loss=0.09764, beats_loss=0.01059, ecapa_loss=0.0001363, whisper_loss=0.08569, over 17049.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08954, over 3846547.61 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:59:52,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4847780.0, ans=0.125 2024-08-20 14:59:56,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2024-08-20 15:00:10,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.73 vs. limit=10.0 2024-08-20 15:01:47,016 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10650, loss[loss=0.1055, beats_loss=0.01231, ecapa_loss=0.0001259, whisper_loss=0.09192, over 23480.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001389, whisper_loss=0.08877, over 3823222.73 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:01:49,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4848280.0, ans=0.07 2024-08-20 15:02:37,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4848480.0, ans=0.0 2024-08-20 15:02:38,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.292e+01 2.599e+01 2.866e+01 5.790e+01, threshold=5.197e+01, percent-clipped=1.0 2024-08-20 15:03:06,093 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 15:03:28,866 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 15:03:54,785 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10700, loss[loss=0.09186, beats_loss=0.01135, ecapa_loss=0.0001114, whisper_loss=0.0794, over 20955.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.08916, over 3819792.41 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:04:56,886 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 15:05:01,812 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 15:05:04,836 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 15:05:20,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4849080.0, ans=0.0 2024-08-20 15:06:00,153 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10750, loss[loss=0.07906, beats_loss=0.01163, ecapa_loss=0.0001158, whisper_loss=0.06627, over 23830.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.08962, over 3837292.28 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:06:05,649 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 15:06:09,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4849280.0, ans=0.0 2024-08-20 15:06:10,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4849280.0, ans=0.125 2024-08-20 15:06:12,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4849280.0, ans=0.2 2024-08-20 15:06:19,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4849280.0, ans=0.2 2024-08-20 15:06:25,746 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 15:06:27,965 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 15:06:32,498 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 15:06:49,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.343e+01 2.598e+01 3.019e+01 8.630e+01, threshold=5.195e+01, percent-clipped=2.0 2024-08-20 15:07:25,765 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 16 from LS+wenet, 29 from Vox, 47 fro AS 2024-08-20 15:07:41,349 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 15:07:52,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-20 15:07:57,574 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10800, loss[loss=0.09627, beats_loss=0.01075, ecapa_loss=0.0001226, whisper_loss=0.08429, over 22185.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001395, whisper_loss=0.08881, over 3825633.90 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:08:06,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4849780.0, ans=0.125 2024-08-20 15:08:06,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=4849780.0, ans=15.0 2024-08-20 15:08:11,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4849780.0, ans=0.1 2024-08-20 15:08:12,397 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 15:08:16,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4849780.0, ans=0.04949747468305833 2024-08-20 15:08:27,511 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 18 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 15:08:31,329 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-20 15:09:11,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-20 15:09:11,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=15.0 2024-08-20 15:09:14,435 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 15:09:37,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4850180.0, ans=0.2 2024-08-20 15:09:49,089 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-20 15:09:53,682 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10850, loss[loss=0.08055, beats_loss=0.01264, ecapa_loss=0.0001369, whisper_loss=0.06654, over 21157.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01054, ecapa_loss=0.0001376, whisper_loss=0.08838, over 3810777.20 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:10:42,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.250e+01 2.553e+01 2.783e+01 2.694e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-20 15:10:46,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4850480.0, ans=0.2 2024-08-20 15:11:20,751 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 13 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 15:11:26,888 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 11 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 15:11:47,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4850680.0, ans=0.125 2024-08-20 15:11:57,402 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10900, loss[loss=0.0925, beats_loss=0.009089, ecapa_loss=0.0001733, whisper_loss=0.08168, over 20210.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01035, ecapa_loss=0.0001379, whisper_loss=0.08941, over 3811978.47 frames. ], batch size: 83, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:12:12,657 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 15:12:17,636 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 16 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-20 15:12:24,672 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 30 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 15:13:08,469 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 15:13:11,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4851080.0, ans=0.125 2024-08-20 15:13:53,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4851280.0, ans=0.125 2024-08-20 15:13:54,338 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 10950, loss[loss=0.09172, beats_loss=0.01089, ecapa_loss=0.0001721, whisper_loss=0.07911, over 15472.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001379, whisper_loss=0.09014, over 3822868.85 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:14:01,389 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 24 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-20 15:14:03,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4851280.0, ans=0.125 2024-08-20 15:14:17,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-08-20 15:14:19,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4851380.0, ans=0.125 2024-08-20 15:14:29,050 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 15:14:40,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.141e+01 2.427e+01 2.900e+01 4.434e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 15:14:45,895 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 13 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-20 15:15:02,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4851480.0, ans=0.1 2024-08-20 15:15:03,817 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 15:15:11,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4851580.0, ans=0.125 2024-08-20 15:15:12,674 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 15:15:13,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4851580.0, ans=0.125 2024-08-20 15:15:38,701 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06528465449810028, model_norm_threshold=48.54976272583008 2024-08-20 15:15:38,860 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.33, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.799e+05, orig_rms_sq=1.000e+00 2024-08-20 15:15:53,479 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11000, loss[loss=0.1286, beats_loss=0.008348, ecapa_loss=0.0001347, whisper_loss=0.1189, over 14022.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001383, whisper_loss=0.08922, over 3774646.17 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:16:19,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-20 15:16:22,227 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 15:16:32,373 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 15:16:59,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4851980.0, ans=0.125 2024-08-20 15:17:03,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-20 15:17:15,668 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 15:17:29,691 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 15:17:30,940 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-20 15:17:47,110 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11050, loss[loss=0.1025, beats_loss=0.0111, ecapa_loss=0.0001115, whisper_loss=0.09025, over 19005.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001391, whisper_loss=0.08985, over 3792155.52 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:17:50,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4852280.0, ans=0.1 2024-08-20 15:17:58,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4852280.0, ans=0.0 2024-08-20 15:18:03,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4852280.0, ans=0.125 2024-08-20 15:18:16,109 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 15:18:35,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.285e+01 2.516e+01 2.757e+01 7.437e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 15:19:29,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4852680.0, ans=0.2 2024-08-20 15:19:43,608 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2024-08-20 15:19:46,485 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11100, loss[loss=0.1343, beats_loss=0.007888, ecapa_loss=0.000151, whisper_loss=0.1249, over 23078.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.09032, over 3828244.81 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:19:55,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4852780.0, ans=0.125 2024-08-20 15:19:57,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4852780.0, ans=0.1 2024-08-20 15:20:20,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4852880.0, ans=0.125 2024-08-20 15:20:26,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2024-08-20 15:20:43,065 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 15:20:48,111 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4852980.0, ans=0.07 2024-08-20 15:21:17,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.16 vs. limit=15.0 2024-08-20 15:21:44,378 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11150, loss[loss=0.09937, beats_loss=0.013, ecapa_loss=0.0001506, whisper_loss=0.08487, over 20973.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.09033, over 3890265.83 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:21:59,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4853280.0, ans=0.0 2024-08-20 15:22:21,602 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-20 15:22:30,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.306e+01 2.527e+01 2.770e+01 3.887e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-20 15:22:41,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4853480.0, ans=0.125 2024-08-20 15:22:42,782 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 15:23:12,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-20 15:23:33,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4853680.0, ans=0.125 2024-08-20 15:23:37,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4853680.0, ans=0.0 2024-08-20 15:23:38,941 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 41 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 15:23:46,553 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11200, loss[loss=0.09749, beats_loss=0.01052, ecapa_loss=0.0001232, whisper_loss=0.08574, over 14417.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.09021, over 3884944.68 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:24:05,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4853780.0, ans=0.1 2024-08-20 15:24:10,308 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4853880.0, ans=0.0 2024-08-20 15:24:46,988 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 16 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 15:24:59,607 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 15:25:00,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4853980.0, ans=0.0 2024-08-20 15:25:13,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4854080.0, ans=0.0 2024-08-20 15:25:34,258 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.270e+00 2024-08-20 15:25:56,997 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11250, loss[loss=0.0942, beats_loss=0.01084, ecapa_loss=0.0001147, whisper_loss=0.08222, over 22120.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.09023, over 3873574.23 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:26:07,991 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 36 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 15:26:32,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4854380.0, ans=0.125 2024-08-20 15:26:45,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.343e+01 2.555e+01 2.874e+01 4.205e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 15:26:46,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4854480.0, ans=0.1 2024-08-20 15:26:52,892 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 15:27:41,475 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 20 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 15:27:42,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4854680.0, ans=0.125 2024-08-20 15:27:48,196 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 20 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 15:27:54,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4854680.0, ans=0.125 2024-08-20 15:27:57,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4854780.0, ans=0.2 2024-08-20 15:27:58,053 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11300, loss[loss=0.08564, beats_loss=0.01184, ecapa_loss=0.0001053, whisper_loss=0.07275, over 14583.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08963, over 3829851.63 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:28:00,766 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 15:28:00,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4854780.0, ans=0.125 2024-08-20 15:28:42,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-20 15:28:46,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4854980.0, ans=0.0 2024-08-20 15:28:49,887 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 15:28:51,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4854980.0, ans=0.125 2024-08-20 15:29:07,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4854980.0, ans=0.125 2024-08-20 15:29:09,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4855080.0, ans=0.0 2024-08-20 15:29:24,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4855080.0, ans=0.0 2024-08-20 15:29:27,596 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 15 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 15:29:59,443 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11350, loss[loss=0.09338, beats_loss=0.00917, ecapa_loss=0.0001059, whisper_loss=0.08315, over 15034.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.08953, over 3834207.85 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:30:02,180 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 15:30:28,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4855380.0, ans=0.0 2024-08-20 15:30:41,339 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=22.5 2024-08-20 15:30:43,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4855380.0, ans=0.0 2024-08-20 15:30:49,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.328e+01 2.511e+01 2.770e+01 2.674e+02, threshold=5.022e+01, percent-clipped=3.0 2024-08-20 15:30:56,966 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 15:31:01,575 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.29 vs. limit=22.5 2024-08-20 15:31:23,999 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 21 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 15:32:03,234 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11400, loss[loss=0.1085, beats_loss=0.01055, ecapa_loss=0.0001707, whisper_loss=0.09629, over 22356.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0103, ecapa_loss=0.0001399, whisper_loss=0.09111, over 3845735.96 frames. ], batch size: 94, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:32:14,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=12.0 2024-08-20 15:32:45,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.71 vs. limit=10.0 2024-08-20 15:32:57,527 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 13 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 15:33:03,852 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 15:33:10,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4855980.0, ans=0.125 2024-08-20 15:33:27,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4856080.0, ans=0.1 2024-08-20 15:33:59,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=12.0 2024-08-20 15:34:02,582 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11450, loss[loss=0.09721, beats_loss=0.01002, ecapa_loss=0.000142, whisper_loss=0.08577, over 22037.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.00014, whisper_loss=0.09093, over 3836030.97 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:34:06,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4856280.0, ans=0.125 2024-08-20 15:34:15,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4856280.0, ans=0.0 2024-08-20 15:34:30,815 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 15:34:32,774 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-20 15:34:47,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4856380.0, ans=0.0 2024-08-20 15:34:53,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.356e+01 2.634e+01 3.043e+01 3.885e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-20 15:35:15,986 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 15:35:27,388 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-20 15:35:34,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4856580.0, ans=0.125 2024-08-20 15:36:02,519 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11500, loss[loss=0.1062, beats_loss=0.01032, ecapa_loss=0.0001545, whisper_loss=0.09431, over 22721.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.09075, over 3847370.51 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:36:14,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.73 vs. limit=22.5 2024-08-20 15:36:25,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4856880.0, ans=0.0 2024-08-20 15:36:45,191 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2024-08-20 15:36:50,959 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4856980.0, ans=0.0 2024-08-20 15:37:04,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4856980.0, ans=0.125 2024-08-20 15:37:09,988 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-20 15:37:20,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4857080.0, ans=0.035 2024-08-20 15:37:23,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4857080.0, ans=0.015 2024-08-20 15:37:53,874 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11550, loss[loss=0.104, beats_loss=0.006347, ecapa_loss=0.0001595, whisper_loss=0.09602, over 13763.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01029, ecapa_loss=0.0001401, whisper_loss=0.0915, over 3864704.69 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:37:57,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4857280.0, ans=0.125 2024-08-20 15:38:38,636 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 15:38:40,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.202e+01 2.508e+01 2.840e+01 4.143e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-20 15:38:57,124 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 15:39:16,733 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 11 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 15:39:39,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4857680.0, ans=0.0 2024-08-20 15:39:47,008 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11600, loss[loss=0.0945, beats_loss=0.009417, ecapa_loss=0.0001661, whisper_loss=0.08342, over 17840.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.09097, over 3864374.22 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:39:59,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-20 15:40:02,332 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 15:40:14,544 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-20 15:40:25,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4857880.0, ans=0.07 2024-08-20 15:40:25,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4857880.0, ans=0.2 2024-08-20 15:40:25,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4857880.0, ans=0.09899494936611666 2024-08-20 15:40:30,504 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 15:41:03,555 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.104e+05 2024-08-20 15:41:21,342 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 15:41:29,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4858180.0, ans=0.0 2024-08-20 15:41:29,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4858180.0, ans=0.125 2024-08-20 15:41:36,337 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11650, loss[loss=0.1332, beats_loss=0.00922, ecapa_loss=0.0001351, whisper_loss=0.1227, over 20337.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001409, whisper_loss=0.0908, over 3815071.45 frames. ], batch size: 76, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:41:57,221 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 15:42:17,183 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 15:42:24,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4858480.0, ans=0.0 2024-08-20 15:42:24,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.221e+01 2.529e+01 2.912e+01 8.219e+01, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 15:43:34,402 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11700, loss[loss=0.1021, beats_loss=0.01252, ecapa_loss=0.0001282, whisper_loss=0.08828, over 22080.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01031, ecapa_loss=0.000141, whisper_loss=0.09156, over 3859713.10 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:44:24,030 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 15:44:30,244 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-08-20 15:44:54,270 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4859080.0, ans=0.125 2024-08-20 15:44:56,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4859080.0, ans=0.0 2024-08-20 15:45:04,234 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 15:45:17,177 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 15:45:22,445 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 15:45:26,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4859280.0, ans=0.0 2024-08-20 15:45:27,530 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11750, loss[loss=0.09702, beats_loss=0.008138, ecapa_loss=0.0001753, whisper_loss=0.08713, over 15137.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01029, ecapa_loss=0.0001399, whisper_loss=0.09155, over 3861614.42 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:46:10,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.330e+01 2.512e+01 2.808e+01 3.989e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 15:46:30,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4859480.0, ans=0.125 2024-08-20 15:46:41,077 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:46:50,059 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03734064847230911, model_norm_threshold=50.2408561706543 2024-08-20 15:46:50,217 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.797e+05, grad_sumsq=2.797e+05, orig_rms_sq=1.000e+00 2024-08-20 15:46:59,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4859680.0, ans=0.125 2024-08-20 15:47:14,973 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11800, loss[loss=0.09036, beats_loss=0.01229, ecapa_loss=0.0001246, whisper_loss=0.07682, over 19970.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.09118, over 3858213.60 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:47:22,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.11 vs. limit=10.0 2024-08-20 15:48:12,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4859980.0, ans=0.2 2024-08-20 15:48:12,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4859980.0, ans=0.125 2024-08-20 15:48:25,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4860080.0, ans=0.125 2024-08-20 15:48:58,011 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11850, loss[loss=0.1072, beats_loss=0.01014, ecapa_loss=0.0001296, whisper_loss=0.09571, over 22852.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.09015, over 3854567.41 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:49:12,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-20 15:49:24,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4860380.0, ans=0.2 2024-08-20 15:49:36,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.263e+01 2.480e+01 2.849e+01 1.345e+03, threshold=4.961e+01, percent-clipped=1.0 2024-08-20 15:49:41,198 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 15:49:44,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4860480.0, ans=0.0 2024-08-20 15:49:52,112 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 15 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 15:50:05,144 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2024-08-20 15:50:06,833 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2024-08-20 15:50:17,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.72 vs. limit=5.0 2024-08-20 15:50:31,632 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 15:50:38,960 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11900, loss[loss=0.1075, beats_loss=0.01187, ecapa_loss=0.0001464, whisper_loss=0.0942, over 21695.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.08991, over 3820612.99 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:51:04,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4860880.0, ans=0.2 2024-08-20 15:51:21,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4860980.0, ans=0.0 2024-08-20 15:51:36,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4860980.0, ans=0.2 2024-08-20 15:51:38,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4860980.0, ans=0.1 2024-08-20 15:51:44,115 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 15:51:52,077 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 14 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 15:52:22,946 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 11950, loss[loss=0.1143, beats_loss=0.01145, ecapa_loss=0.0001188, whisper_loss=0.1017, over 22855.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001411, whisper_loss=0.0902, over 3828553.74 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:52:32,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4861280.0, ans=0.0 2024-08-20 15:52:43,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4861380.0, ans=0.125 2024-08-20 15:52:51,332 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 15:52:54,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4861380.0, ans=0.1 2024-08-20 15:53:06,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.342e+01 2.523e+01 2.820e+01 2.544e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-20 15:53:10,030 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4861480.0, ans=0.0 2024-08-20 15:53:39,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4861580.0, ans=0.0 2024-08-20 15:53:51,966 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 15:54:01,411 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 15:54:13,279 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12000, loss[loss=0.1249, beats_loss=0.01024, ecapa_loss=0.0001334, whisper_loss=0.1133, over 23289.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001406, whisper_loss=0.09035, over 3858950.38 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:54:13,280 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 15:54:48,598 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on ASR_libri: loss=0.2555, beats_loss=0, ecapa_loss=0.000501, whisper_loss=0.2505, over 931116.00 frames. 2024-08-20 15:55:14,043 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on SV_voxceleb1: loss=0.003892, beats_loss=0, ecapa_loss=0.0003892, whisper_loss=0, over 944235.00 frames. 2024-08-20 15:56:55,470 INFO [train_multi_KD3.py:1150] (1/4) Epoch 33, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 15:56:55,474 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 15:56:58,581 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 15:57:24,640 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:57:32,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4861980.0, ans=0.125 2024-08-20 15:57:44,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4862080.0, ans=0.0 2024-08-20 15:58:12,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4862180.0, ans=0.125 2024-08-20 15:58:19,396 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12050, loss[loss=0.08593, beats_loss=0.01028, ecapa_loss=0.0001529, whisper_loss=0.07412, over 13853.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.09046, over 3894192.86 frames. ], batch size: 51, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:58:32,821 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 15:58:37,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4862380.0, ans=0.2 2024-08-20 15:58:48,681 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 15:58:53,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.378e+01 2.665e+01 2.948e+01 5.073e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-20 15:58:58,266 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 18 from LS+wenet, 32 from Vox, 44 fro AS 2024-08-20 15:59:12,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4862580.0, ans=0.125 2024-08-20 15:59:13,838 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 15:59:16,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4862580.0, ans=0.125 2024-08-20 15:59:19,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4862580.0, ans=0.125 2024-08-20 15:59:31,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4862680.0, ans=0.1 2024-08-20 15:59:34,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4862680.0, ans=0.0 2024-08-20 15:59:41,813 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-08-20 15:59:44,449 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12100, loss[loss=0.1207, beats_loss=0.007538, ecapa_loss=0.0001334, whisper_loss=0.1118, over 18480.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001398, whisper_loss=0.09057, over 3892731.39 frames. ], batch size: 70, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:00:20,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4862980.0, ans=0.0 2024-08-20 16:00:24,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4862980.0, ans=0.0 2024-08-20 16:00:35,722 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 16:00:42,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4863080.0, ans=0.125 2024-08-20 16:00:48,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2024-08-20 16:01:00,033 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=12.0 2024-08-20 16:01:01,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4863180.0, ans=0.0 2024-08-20 16:01:07,083 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12150, loss[loss=0.08886, beats_loss=0.01017, ecapa_loss=0.000127, whisper_loss=0.07742, over 13670.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001401, whisper_loss=0.09056, over 3882130.38 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:01:15,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4863280.0, ans=0.125 2024-08-20 16:01:31,839 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.418e+05 2024-08-20 16:01:32,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2024-08-20 16:01:39,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.292e+01 2.549e+01 2.868e+01 6.331e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-20 16:01:39,894 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 22 from LS+wenet, 13 from Vox, 56 fro AS 2024-08-20 16:01:55,908 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 13 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 16:02:22,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=15.0 2024-08-20 16:02:28,282 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12200, loss[loss=0.09414, beats_loss=0.012, ecapa_loss=0.0001292, whisper_loss=0.08085, over 20699.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.08973, over 3836060.23 frames. ], batch size: 79, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:02:33,207 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 16:02:33,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4863780.0, ans=0.125 2024-08-20 16:02:44,837 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 16:02:49,690 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 31 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 16:03:08,754 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 16:03:14,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4863980.0, ans=0.125 2024-08-20 16:03:18,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-20 16:03:29,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4864080.0, ans=0.125 2024-08-20 16:03:32,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4864180.0, ans=0.125 2024-08-20 16:03:38,660 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 16:03:49,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4864280.0, ans=0.5 2024-08-20 16:03:49,860 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12250, loss[loss=0.1059, beats_loss=0.01055, ecapa_loss=0.000111, whisper_loss=0.0942, over 15307.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.000138, whisper_loss=0.09072, over 3840840.49 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:04:09,946 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-20 16:04:15,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2024-08-20 16:04:17,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4864380.0, ans=0.125 2024-08-20 16:04:21,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.269e+01 2.404e+01 2.750e+01 9.360e+01, threshold=4.808e+01, percent-clipped=1.0 2024-08-20 16:04:33,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4864480.0, ans=0.125 2024-08-20 16:05:11,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-20 16:05:11,854 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12300, loss[loss=0.09517, beats_loss=0.01077, ecapa_loss=0.0001681, whisper_loss=0.08271, over 21079.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001383, whisper_loss=0.09052, over 3832065.79 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:05:12,685 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.29 vs. limit=22.5 2024-08-20 16:05:19,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4864780.0, ans=0.125 2024-08-20 16:05:36,379 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05332305282354355, model_norm_threshold=48.08091354370117 2024-08-20 16:05:36,538 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.333e+05, grad_sumsq=1.333e+05, orig_rms_sq=1.000e+00 2024-08-20 16:05:42,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-20 16:05:43,545 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 16:05:50,496 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2024-08-20 16:05:56,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4864980.0, ans=0.1 2024-08-20 16:06:06,156 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 19 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-20 16:06:09,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4865080.0, ans=0.1 2024-08-20 16:06:22,530 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 33 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 16:06:34,542 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12350, loss[loss=0.1012, beats_loss=0.01205, ecapa_loss=0.0001266, whisper_loss=0.08791, over 22618.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001388, whisper_loss=0.09035, over 3807766.54 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:06:43,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-08-20 16:07:08,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.318e+01 2.528e+01 2.855e+01 9.017e+02, threshold=5.055e+01, percent-clipped=1.0 2024-08-20 16:07:10,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4865480.0, ans=0.125 2024-08-20 16:07:21,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4865480.0, ans=0.125 2024-08-20 16:07:50,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4865680.0, ans=0.125 2024-08-20 16:07:54,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4865680.0, ans=0.1 2024-08-20 16:08:00,603 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12400, loss[loss=0.1007, beats_loss=0.009051, ecapa_loss=0.0001613, whisper_loss=0.09, over 15861.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001388, whisper_loss=0.0904, over 3762028.64 frames. ], batch size: 65, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:08:09,723 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-08-20 16:08:14,385 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 16:08:23,180 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 16:08:37,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4865880.0, ans=15.0 2024-08-20 16:08:38,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4865980.0, ans=0.0 2024-08-20 16:09:07,384 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 16:09:28,367 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 16:09:39,950 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12450, loss[loss=0.07259, beats_loss=0.01054, ecapa_loss=0.0001431, whisper_loss=0.06062, over 20722.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.000139, whisper_loss=0.09016, over 3785526.40 frames. ], batch size: 85, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:09:41,551 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:09:54,063 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4866280.0, ans=0.0 2024-08-20 16:10:08,653 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 16:10:22,389 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.270e+01 2.513e+01 2.843e+01 4.408e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-20 16:10:43,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4866580.0, ans=0.125 2024-08-20 16:10:56,844 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 16:10:59,053 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 16:11:04,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4866680.0, ans=0.0 2024-08-20 16:11:08,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4866680.0, ans=0.2 2024-08-20 16:11:23,876 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12500, loss[loss=0.09751, beats_loss=0.01147, ecapa_loss=0.0001293, whisper_loss=0.08475, over 22304.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001387, whisper_loss=0.0903, over 3802357.35 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:11:59,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4866880.0, ans=0.125 2024-08-20 16:11:59,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4866880.0, ans=0.125 2024-08-20 16:12:06,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4866980.0, ans=0.0 2024-08-20 16:12:13,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4866980.0, ans=0.125 2024-08-20 16:12:16,175 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2024-08-20 16:12:18,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4866980.0, ans=0.0 2024-08-20 16:12:25,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4866980.0, ans=0.0 2024-08-20 16:13:15,591 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12550, loss[loss=0.09702, beats_loss=0.01158, ecapa_loss=0.0001387, whisper_loss=0.08405, over 22192.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001371, whisper_loss=0.08981, over 3816938.69 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:13:15,809 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 16:13:25,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4867280.0, ans=0.125 2024-08-20 16:13:29,970 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 16:13:33,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4867280.0, ans=0.0 2024-08-20 16:13:42,166 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 16:13:42,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4867380.0, ans=0.1 2024-08-20 16:13:50,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-20 16:13:55,618 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2024-08-20 16:14:02,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.459e+01 2.718e+01 3.101e+01 5.496e+01, threshold=5.435e+01, percent-clipped=1.0 2024-08-20 16:14:43,806 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 34 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 16:15:13,163 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12600, loss[loss=0.1165, beats_loss=0.01062, ecapa_loss=0.0001197, whisper_loss=0.1047, over 23641.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001381, whisper_loss=0.09037, over 3834624.65 frames. ], batch size: 94, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:15:21,505 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4867780.0, ans=0.0 2024-08-20 16:15:32,907 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 16:15:50,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-20 16:16:42,947 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 29 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 16:16:44,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-20 16:16:56,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4868180.0, ans=0.1 2024-08-20 16:17:03,823 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 16:17:05,904 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12650, loss[loss=0.09221, beats_loss=0.01192, ecapa_loss=0.0001442, whisper_loss=0.07884, over 21419.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001386, whisper_loss=0.0906, over 3836144.31 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:17:26,916 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 16:17:37,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-08-20 16:17:39,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-20 16:17:51,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.312e+01 2.541e+01 2.719e+01 3.789e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 16:18:56,672 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 16:18:58,741 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12700, loss[loss=0.1043, beats_loss=0.011, ecapa_loss=0.0001549, whisper_loss=0.09173, over 21800.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001391, whisper_loss=0.09047, over 3849921.29 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:19:04,378 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 16:19:13,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4868780.0, ans=0.5 2024-08-20 16:19:31,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4868880.0, ans=0.125 2024-08-20 16:19:55,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4868980.0, ans=0.125 2024-08-20 16:20:07,469 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4869080.0, ans=0.125 2024-08-20 16:20:13,169 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 16:20:14,258 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4869080.0, ans=0.0 2024-08-20 16:20:17,300 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 16:20:51,319 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12750, loss[loss=0.1103, beats_loss=0.007571, ecapa_loss=0.0001211, whisper_loss=0.1015, over 14327.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.08976, over 3829220.03 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:20:59,973 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 16:21:03,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4869280.0, ans=0.025 2024-08-20 16:21:20,623 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 16:21:35,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.357e+01 2.635e+01 3.039e+01 5.268e+01, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 16:21:35,385 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 16:21:46,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-20 16:22:07,933 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4869580.0, ans=0.1 2024-08-20 16:22:07,958 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.339e+00 2024-08-20 16:22:30,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4869680.0, ans=0.1 2024-08-20 16:22:33,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4869680.0, ans=0.0 2024-08-20 16:22:36,634 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12800, loss[loss=0.08755, beats_loss=0.01194, ecapa_loss=0.0001462, whisper_loss=0.07415, over 13866.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.0905, over 3840712.40 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:22:51,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4869780.0, ans=0.2 2024-08-20 16:23:04,142 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 21 from Vox, 12 fro AS 2024-08-20 16:23:12,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4869880.0, ans=0.1 2024-08-20 16:23:18,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4869880.0, ans=0.0 2024-08-20 16:23:40,159 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 31 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-20 16:23:43,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4870080.0, ans=0.125 2024-08-20 16:24:17,260 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 16:24:26,339 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12850, loss[loss=0.1267, beats_loss=0.01078, ecapa_loss=0.0001048, whisper_loss=0.1149, over 18862.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001387, whisper_loss=0.0911, over 3858123.98 frames. ], batch size: 70, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:24:35,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4870280.0, ans=0.2 2024-08-20 16:24:52,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.99 vs. limit=10.0 2024-08-20 16:24:54,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4870380.0, ans=0.125 2024-08-20 16:25:11,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.389e+01 2.612e+01 2.924e+01 4.831e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-20 16:25:19,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4870480.0, ans=0.1 2024-08-20 16:25:20,605 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 16:25:33,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4870580.0, ans=0.125 2024-08-20 16:25:41,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-08-20 16:26:09,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4870680.0, ans=0.1 2024-08-20 16:26:12,661 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12900, loss[loss=0.09907, beats_loss=0.01034, ecapa_loss=0.0001357, whisper_loss=0.08738, over 17486.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01034, ecapa_loss=0.0001388, whisper_loss=0.09187, over 3825546.79 frames. ], batch size: 72, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:26:23,757 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4870780.0, ans=0.2 2024-08-20 16:26:26,579 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2024-08-20 16:26:26,987 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 16:27:01,656 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 16:27:12,816 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 16:27:16,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4871080.0, ans=0.0 2024-08-20 16:27:18,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4871080.0, ans=0.1 2024-08-20 16:27:29,341 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 16:27:30,672 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:27:58,870 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 12950, loss[loss=0.08174, beats_loss=0.008749, ecapa_loss=0.0001194, whisper_loss=0.0718, over 14352.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001386, whisper_loss=0.09171, over 3828174.60 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:28:12,656 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 16:28:40,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.303e+01 2.529e+01 2.820e+01 1.360e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 16:29:03,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4871580.0, ans=0.0 2024-08-20 16:29:40,537 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08873618394136429, model_norm_threshold=50.58396911621094 2024-08-20 16:29:40,692 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.213e+04, grad_sumsq=8.006e+03, orig_rms_sq=9.010e+00 2024-08-20 16:29:47,542 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13000, loss[loss=0.0839, beats_loss=0.01136, ecapa_loss=0.0001525, whisper_loss=0.07102, over 21387.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001389, whisper_loss=0.09093, over 3831588.52 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:30:28,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4871880.0, ans=0.025 2024-08-20 16:30:33,832 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08446179330348969, model_norm_threshold=50.58396911621094 2024-08-20 16:30:33,986 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.496e+04, grad_sumsq=4.191e+06, orig_rms_sq=1.073e-02 2024-08-20 16:30:59,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4872080.0, ans=0.125 2024-08-20 16:31:08,373 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-20 16:31:39,782 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13050, loss[loss=0.1287, beats_loss=0.006952, ecapa_loss=0.0001329, whisper_loss=0.1204, over 15579.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001393, whisper_loss=0.09155, over 3841651.81 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:31:41,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4872280.0, ans=0.1 2024-08-20 16:31:56,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4872280.0, ans=0.125 2024-08-20 16:32:21,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.399e+01 2.559e+01 2.848e+01 5.989e+02, threshold=5.117e+01, percent-clipped=3.0 2024-08-20 16:32:31,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4872480.0, ans=10.0 2024-08-20 16:32:36,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4872480.0, ans=0.2 2024-08-20 16:32:39,183 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 16:32:48,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-20 16:33:07,257 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 16:33:27,449 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13100, loss[loss=0.1155, beats_loss=0.009868, ecapa_loss=0.0001367, whisper_loss=0.1042, over 20608.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001404, whisper_loss=0.09098, over 3807315.56 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:33:30,965 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4872780.0, ans=0.125 2024-08-20 16:33:59,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4872880.0, ans=0.125 2024-08-20 16:34:06,017 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4872880.0, ans=0.125 2024-08-20 16:34:26,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4872980.0, ans=0.1 2024-08-20 16:34:59,428 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-20 16:35:04,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4873180.0, ans=0.2 2024-08-20 16:35:04,421 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:35:23,941 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13150, loss[loss=0.1131, beats_loss=0.01229, ecapa_loss=0.0001276, whisper_loss=0.09948, over 22904.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001417, whisper_loss=0.08977, over 3759397.02 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:35:24,529 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.671e-02 2024-08-20 16:35:34,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4873280.0, ans=0.125 2024-08-20 16:35:43,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2024-08-20 16:35:58,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4873380.0, ans=0.125 2024-08-20 16:36:06,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4873380.0, ans=0.125 2024-08-20 16:36:06,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-20 16:36:10,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.480e+01 2.703e+01 4.896e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 16:36:59,125 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 16:37:15,786 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13200, loss[loss=0.09865, beats_loss=0.01046, ecapa_loss=0.0001494, whisper_loss=0.0867, over 19409.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.08985, over 3751821.65 frames. ], batch size: 78, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:37:39,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4873880.0, ans=0.2 2024-08-20 16:38:04,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4873980.0, ans=0.125 2024-08-20 16:38:17,105 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 32 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 16:38:18,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4873980.0, ans=0.125 2024-08-20 16:38:59,378 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2024-08-20 16:39:03,127 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-20 16:39:05,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-08-20 16:39:05,619 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13250, loss[loss=0.1015, beats_loss=0.01381, ecapa_loss=9.507e-05, whisper_loss=0.08673, over 17849.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001407, whisper_loss=0.09021, over 3790349.71 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:39:13,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4874280.0, ans=0.125 2024-08-20 16:39:15,475 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 16:39:26,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4874380.0, ans=0.0 2024-08-20 16:39:40,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4874380.0, ans=0.125 2024-08-20 16:39:40,293 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-20 16:39:47,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.277e+01 2.601e+01 3.009e+01 4.180e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-20 16:40:51,041 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13300, loss[loss=0.1163, beats_loss=0.008715, ecapa_loss=0.0001495, whisper_loss=0.1061, over 18894.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.08906, over 3767416.83 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:40:54,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4874780.0, ans=0.0 2024-08-20 16:41:02,099 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 16:41:17,156 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 16:41:28,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4874880.0, ans=0.0 2024-08-20 16:41:39,933 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-20 16:41:46,775 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-20 16:41:47,646 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 16:41:56,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4875080.0, ans=0.0 2024-08-20 16:42:02,146 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 16:42:21,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4875180.0, ans=0.125 2024-08-20 16:42:40,144 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13350, loss[loss=0.1302, beats_loss=0.005899, ecapa_loss=0.0001799, whisper_loss=0.1225, over 17004.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.08873, over 3797196.92 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:42:45,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4875280.0, ans=0.07 2024-08-20 16:42:47,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4875280.0, ans=0.05 2024-08-20 16:43:03,654 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 16:43:10,208 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-20 16:43:11,568 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4875380.0, ans=0.125 2024-08-20 16:43:14,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4875380.0, ans=0.0 2024-08-20 16:43:15,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4875380.0, ans=0.035 2024-08-20 16:43:21,627 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 16:43:23,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.248e+01 2.436e+01 2.816e+01 2.858e+02, threshold=4.871e+01, percent-clipped=3.0 2024-08-20 16:43:24,805 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4875480.0, ans=0.0 2024-08-20 16:43:32,014 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 23 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-20 16:43:34,952 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 16:43:49,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4875580.0, ans=0.2 2024-08-20 16:44:32,071 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 37 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-20 16:44:34,395 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13400, loss[loss=0.1253, beats_loss=0.007065, ecapa_loss=0.0001825, whisper_loss=0.1164, over 22195.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.0891, over 3764660.94 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:44:38,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4875780.0, ans=0.125 2024-08-20 16:44:41,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4875780.0, ans=0.1 2024-08-20 16:44:46,495 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 16:45:01,410 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 16:45:10,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4875880.0, ans=0.1 2024-08-20 16:45:28,012 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 16:45:34,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4875980.0, ans=0.125 2024-08-20 16:45:51,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4876080.0, ans=0.125 2024-08-20 16:45:51,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-08-20 16:45:56,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4876080.0, ans=0.0 2024-08-20 16:46:00,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4876080.0, ans=0.125 2024-08-20 16:46:17,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4876180.0, ans=0.125 2024-08-20 16:46:30,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4876180.0, ans=0.0 2024-08-20 16:46:33,331 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13450, loss[loss=0.09277, beats_loss=0.01089, ecapa_loss=0.0001738, whisper_loss=0.08013, over 19254.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.08928, over 3762711.82 frames. ], batch size: 79, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:47:00,569 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4876380.0, ans=0.1 2024-08-20 16:47:20,305 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.518e+01 2.794e+01 2.882e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 16:47:35,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4876480.0, ans=0.125 2024-08-20 16:47:47,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4876580.0, ans=0.0 2024-08-20 16:48:13,693 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-20 16:48:15,686 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 16:48:20,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4876680.0, ans=0.2 2024-08-20 16:48:25,848 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13500, loss[loss=0.1139, beats_loss=0.009178, ecapa_loss=0.0001331, whisper_loss=0.1033, over 17563.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.08956, over 3801751.00 frames. ], batch size: 68, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:49:04,782 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 21 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-20 16:49:06,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4876880.0, ans=0.035 2024-08-20 16:49:11,861 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=12.0 2024-08-20 16:49:20,870 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.809e+01 2024-08-20 16:49:24,979 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 34 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 16:49:39,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4877080.0, ans=0.125 2024-08-20 16:50:03,331 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 16:50:21,769 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13550, loss[loss=0.09149, beats_loss=0.01239, ecapa_loss=0.000104, whisper_loss=0.07806, over 13605.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08959, over 3780035.89 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:50:31,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-08-20 16:50:53,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4877380.0, ans=0.125 2024-08-20 16:51:09,908 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4877480.0, ans=0.1 2024-08-20 16:51:10,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.534e+01 2.808e+01 5.425e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-20 16:51:22,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4877480.0, ans=0.125 2024-08-20 16:51:27,724 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 20 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-20 16:51:46,841 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 16:51:48,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-20 16:51:52,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4877580.0, ans=0.2 2024-08-20 16:51:53,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4877580.0, ans=0.1 2024-08-20 16:52:07,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-20 16:52:23,270 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13600, loss[loss=0.09925, beats_loss=0.01046, ecapa_loss=0.0001288, whisper_loss=0.0875, over 20628.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.08996, over 3792110.65 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:53:54,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4878080.0, ans=0.0 2024-08-20 16:54:13,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4878180.0, ans=0.125 2024-08-20 16:54:24,382 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13650, loss[loss=0.1099, beats_loss=0.008662, ecapa_loss=0.0001224, whisper_loss=0.1001, over 20102.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.08994, over 3783214.43 frames. ], batch size: 75, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:54:29,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4878280.0, ans=0.0 2024-08-20 16:54:33,827 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 18 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-20 16:54:45,644 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 16:54:52,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4878380.0, ans=0.0 2024-08-20 16:55:04,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-08-20 16:55:11,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.380e+01 2.594e+01 2.939e+01 1.944e+02, threshold=5.188e+01, percent-clipped=3.0 2024-08-20 16:55:24,040 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 31 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-20 16:55:25,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4878480.0, ans=0.0 2024-08-20 16:55:28,980 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 16:55:32,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-20 16:55:48,049 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 15 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-20 16:56:05,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4878680.0, ans=0.1 2024-08-20 16:56:15,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4878680.0, ans=0.125 2024-08-20 16:56:23,407 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13700, loss[loss=0.1149, beats_loss=0.008688, ecapa_loss=0.0001285, whisper_loss=0.1049, over 16902.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01026, ecapa_loss=0.0001402, whisper_loss=0.09047, over 3795276.84 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:56:46,374 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4878880.0, ans=0.2 2024-08-20 16:56:58,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-20 16:57:08,365 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 16:57:14,782 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4878980.0, ans=0.2 2024-08-20 16:57:17,904 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 16:57:42,697 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 16:58:00,090 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 16:58:09,974 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 16:58:17,607 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13750, loss[loss=0.1022, beats_loss=0.009291, ecapa_loss=0.000164, whisper_loss=0.09125, over 20760.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001396, whisper_loss=0.09009, over 3805822.56 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:58:35,969 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 16:58:47,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4879380.0, ans=0.0 2024-08-20 16:59:03,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.268e+01 2.560e+01 2.819e+01 5.576e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-20 16:59:20,897 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 16:59:23,177 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 16:59:36,536 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4879580.0, ans=0.2 2024-08-20 16:59:41,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4879580.0, ans=0.0 2024-08-20 17:00:15,478 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13800, loss[loss=0.09084, beats_loss=0.01351, ecapa_loss=0.000114, whisper_loss=0.0762, over 22448.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001408, whisper_loss=0.09029, over 3791552.28 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:00:30,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4879780.0, ans=0.0 2024-08-20 17:00:54,331 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 17:01:07,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4879980.0, ans=0.125 2024-08-20 17:01:10,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4879980.0, ans=0.125 2024-08-20 17:01:18,293 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 17:01:34,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4880080.0, ans=0.125 2024-08-20 17:01:42,680 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 17:02:08,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4880180.0, ans=0.0 2024-08-20 17:02:14,264 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13850, loss[loss=0.134, beats_loss=0.009073, ecapa_loss=0.0001175, whisper_loss=0.1237, over 21103.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001392, whisper_loss=0.09054, over 3785701.40 frames. ], batch size: 76, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:02:27,855 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4880280.0, ans=0.125 2024-08-20 17:02:40,915 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 17:02:50,856 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 17:03:01,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.225e+01 2.417e+01 2.709e+01 3.540e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-20 17:03:08,174 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 17:03:28,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2024-08-20 17:03:32,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4880580.0, ans=0.125 2024-08-20 17:03:33,316 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 17:03:53,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4880680.0, ans=0.125 2024-08-20 17:04:06,190 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 17:04:06,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4880680.0, ans=0.125 2024-08-20 17:04:07,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4880680.0, ans=0.1 2024-08-20 17:04:07,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4880680.0, ans=0.0 2024-08-20 17:04:10,310 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13900, loss[loss=0.1147, beats_loss=0.01029, ecapa_loss=0.000137, whisper_loss=0.1031, over 20365.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001389, whisper_loss=0.09019, over 3763310.30 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:04:22,180 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 17:04:37,746 WARNING [optim.py:496] (1/4) Scaling gradients by 0.016832223162055016, model_norm_threshold=48.33732604980469 2024-08-20 17:04:37,901 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.526e+05, grad_sumsq=7.526e+05, orig_rms_sq=1.000e+00 2024-08-20 17:05:03,196 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-20 17:05:28,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4881080.0, ans=0.0 2024-08-20 17:05:57,357 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 13950, loss[loss=0.1108, beats_loss=0.01018, ecapa_loss=0.0001343, whisper_loss=0.0993, over 23113.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08959, over 3808799.25 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:05:57,538 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 17:06:28,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-20 17:06:40,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.333e+01 2.602e+01 2.929e+01 2.872e+03, threshold=5.204e+01, percent-clipped=2.0 2024-08-20 17:06:46,499 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 17:06:51,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4881480.0, ans=0.125 2024-08-20 17:07:00,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=15.0 2024-08-20 17:07:05,660 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 19 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-20 17:07:09,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=12.0 2024-08-20 17:07:23,049 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 17:07:33,342 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 23 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 17:07:40,025 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 17:07:42,173 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 17:07:43,294 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14000, loss[loss=0.08783, beats_loss=0.0103, ecapa_loss=0.0001325, whisper_loss=0.0762, over 16792.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08961, over 3808735.36 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:07:49,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4881780.0, ans=0.125 2024-08-20 17:07:50,203 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2024-08-20 17:07:51,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4881780.0, ans=0.0 2024-08-20 17:08:31,540 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2024-08-20 17:09:10,976 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 17:09:21,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4882180.0, ans=0.0 2024-08-20 17:09:38,434 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14050, loss[loss=0.1012, beats_loss=0.01372, ecapa_loss=0.0001098, whisper_loss=0.08638, over 23545.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.08954, over 3814627.80 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:10:00,046 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-20 17:10:05,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4882380.0, ans=0.125 2024-08-20 17:10:25,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.229e+01 2.456e+01 2.749e+01 5.293e+01, threshold=4.913e+01, percent-clipped=1.0 2024-08-20 17:10:26,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4882480.0, ans=0.125 2024-08-20 17:10:32,530 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 17:10:43,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4882480.0, ans=0.07 2024-08-20 17:11:32,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4882680.0, ans=0.125 2024-08-20 17:11:32,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4882680.0, ans=0.1 2024-08-20 17:11:35,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4882780.0, ans=0.0 2024-08-20 17:11:36,463 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14100, loss[loss=0.0877, beats_loss=0.009572, ecapa_loss=0.0001339, whisper_loss=0.07679, over 12283.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001409, whisper_loss=0.08971, over 3847769.16 frames. ], batch size: 50, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:12:29,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4882980.0, ans=0.0 2024-08-20 17:13:00,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4883080.0, ans=0.2 2024-08-20 17:13:02,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4883080.0, ans=0.0 2024-08-20 17:13:33,711 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14150, loss[loss=0.1165, beats_loss=0.009964, ecapa_loss=0.0001211, whisper_loss=0.1053, over 14567.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001408, whisper_loss=0.09027, over 3815190.36 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:13:34,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4883280.0, ans=0.125 2024-08-20 17:13:35,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4883280.0, ans=0.95 2024-08-20 17:13:36,047 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 16 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-20 17:13:51,840 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 17:14:11,819 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 20 from LS+wenet, 29 from Vox, 46 fro AS 2024-08-20 17:14:18,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.324e+01 2.479e+01 2.720e+01 4.062e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-20 17:14:23,362 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 17:15:14,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-08-20 17:15:25,210 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14200, loss[loss=0.0897, beats_loss=0.01298, ecapa_loss=8.952e-05, whisper_loss=0.07583, over 16086.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.09046, over 3830449.95 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:15:39,204 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-20 17:15:40,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4883780.0, ans=0.125 2024-08-20 17:15:42,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-20 17:16:17,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4883980.0, ans=0.0 2024-08-20 17:16:22,498 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 17:17:00,687 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 20 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-20 17:17:02,365 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 17:17:10,845 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14250, loss[loss=0.1163, beats_loss=0.009321, ecapa_loss=0.000133, whisper_loss=0.1056, over 21842.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001401, whisper_loss=0.09046, over 3838693.96 frames. ], batch size: 83, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:17:17,226 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 17:17:18,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-20 17:17:18,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-20 17:17:20,708 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.16 vs. limit=22.5 2024-08-20 17:17:22,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4884280.0, ans=0.125 2024-08-20 17:17:25,132 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2024-08-20 17:17:27,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4884280.0, ans=0.1 2024-08-20 17:17:53,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.287e+01 2.497e+01 2.839e+01 4.280e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 17:18:03,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4884480.0, ans=0.1 2024-08-20 17:18:06,656 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 17:18:19,506 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 17:18:24,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4884580.0, ans=0.1 2024-08-20 17:18:29,335 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-08-20 17:18:53,722 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14300, loss[loss=0.1014, beats_loss=0.009321, ecapa_loss=0.0001752, whisper_loss=0.09036, over 21402.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001404, whisper_loss=0.09029, over 3825572.22 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:18:53,984 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 17:19:00,989 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 26 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 17:19:02,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4884780.0, ans=0.0 2024-08-20 17:19:09,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4884780.0, ans=0.125 2024-08-20 17:19:09,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=12.0 2024-08-20 17:19:20,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4884880.0, ans=0.125 2024-08-20 17:19:35,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4884980.0, ans=0.0 2024-08-20 17:19:54,452 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2024-08-20 17:20:01,256 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4885080.0, ans=0.2 2024-08-20 17:20:08,150 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-08-20 17:20:38,291 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14350, loss[loss=0.1189, beats_loss=0.01198, ecapa_loss=0.0001331, whisper_loss=0.1056, over 22127.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.0001403, whisper_loss=0.09078, over 3817844.73 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:20:51,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-20 17:20:56,702 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 17:21:04,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4885380.0, ans=0.125 2024-08-20 17:21:11,032 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 17:21:18,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4885480.0, ans=0.125 2024-08-20 17:21:19,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.424e+01 2.742e+01 3.115e+01 1.804e+02, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 17:21:21,098 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 25 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 17:21:40,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4885580.0, ans=0.0 2024-08-20 17:21:51,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4885580.0, ans=0.05 2024-08-20 17:21:59,655 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 25 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 17:22:01,177 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-08-20 17:22:03,410 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-20 17:22:13,005 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 30 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 17:22:13,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4885680.0, ans=0.04949747468305833 2024-08-20 17:22:18,666 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14400, loss[loss=0.1142, beats_loss=0.008732, ecapa_loss=0.0001362, whisper_loss=0.1041, over 22259.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01032, ecapa_loss=0.0001399, whisper_loss=0.09097, over 3796275.49 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:22:53,238 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 38 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 17:22:58,783 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 17:23:27,543 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 17:23:47,592 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 24 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-20 17:23:58,242 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14450, loss[loss=0.08958, beats_loss=0.009564, ecapa_loss=0.0001612, whisper_loss=0.0784, over 21611.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.0913, over 3802711.72 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:24:41,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4886480.0, ans=0.125 2024-08-20 17:24:41,901 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.260e+01 2.488e+01 2.810e+01 3.938e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-20 17:24:47,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4886480.0, ans=0.0 2024-08-20 17:24:47,160 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.536e+01 2024-08-20 17:24:49,325 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 17:25:11,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4886580.0, ans=0.125 2024-08-20 17:25:14,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4886580.0, ans=0.025 2024-08-20 17:25:40,959 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14500, loss[loss=0.1121, beats_loss=0.009851, ecapa_loss=0.0001306, whisper_loss=0.1009, over 18028.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09114, over 3801882.64 frames. ], batch size: 70, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:25:58,928 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 17:26:14,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4886880.0, ans=0.0 2024-08-20 17:27:06,716 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 17:27:12,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4887180.0, ans=0.2 2024-08-20 17:27:25,744 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14550, loss[loss=0.1075, beats_loss=0.01221, ecapa_loss=0.0001292, whisper_loss=0.094, over 17543.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001393, whisper_loss=0.09018, over 3778507.87 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:27:28,657 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 17:27:38,653 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 17:27:48,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-20 17:27:52,996 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4887380.0, ans=0.0 2024-08-20 17:28:11,719 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.302e+01 2.517e+01 2.766e+01 3.665e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-20 17:28:21,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4887480.0, ans=0.125 2024-08-20 17:28:26,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2024-08-20 17:28:31,430 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 16 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 17:28:33,365 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 21 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 17:28:58,035 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 31 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 17:28:58,848 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4887680.0, ans=0.125 2024-08-20 17:29:15,855 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14600, loss[loss=0.1069, beats_loss=0.009706, ecapa_loss=0.0001436, whisper_loss=0.09572, over 21759.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.08982, over 3792670.60 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:29:34,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-20 17:29:47,917 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 17:30:17,593 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 26 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 17:30:24,427 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 17:30:25,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4888080.0, ans=0.125 2024-08-20 17:30:49,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4888180.0, ans=0.0 2024-08-20 17:30:54,064 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-08-20 17:30:59,532 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 12 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 17:31:02,577 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14650, loss[loss=0.09028, beats_loss=0.01134, ecapa_loss=0.0001494, whisper_loss=0.07744, over 20499.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001384, whisper_loss=0.0894, over 3768228.36 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:31:20,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-08-20 17:31:24,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=10.0 2024-08-20 17:31:32,088 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 17:31:36,465 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 17:31:48,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.228e+01 2.454e+01 2.785e+01 6.684e+01, threshold=4.907e+01, percent-clipped=2.0 2024-08-20 17:31:48,436 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 17:31:54,497 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 16 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 17:32:31,502 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.435e+01 2024-08-20 17:32:46,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4888680.0, ans=0.0 2024-08-20 17:32:48,726 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4888680.0, ans=0.125 2024-08-20 17:32:51,506 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14700, loss[loss=0.1201, beats_loss=0.009139, ecapa_loss=0.0001519, whisper_loss=0.1094, over 21866.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001399, whisper_loss=0.08969, over 3787306.45 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:33:29,011 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 17:34:04,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-20 17:34:16,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4889180.0, ans=0.0 2024-08-20 17:34:18,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4889180.0, ans=0.125 2024-08-20 17:34:22,060 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 17:34:23,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4889180.0, ans=0.1 2024-08-20 17:34:29,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4889180.0, ans=0.0 2024-08-20 17:34:33,929 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4889280.0, ans=0.0 2024-08-20 17:34:34,567 INFO [train_multi_KD3.py:1117] (1/4) Epoch 33, batch 14750, loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001281, whisper_loss=0.09066, over 22212.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.08984, over 3804784.63 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:35:15,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4889480.0, ans=0.0 2024-08-20 17:35:17,774 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.334e+01 2.652e+01 2.948e+01 4.454e+01, threshold=5.304e+01, percent-clipped=0.0 2024-08-20 17:35:43,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4889580.0, ans=0.125 2024-08-20 17:35:48,093 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-08-20 17:36:34,078 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 0, loss[loss=0.08617, beats_loss=0.00948, ecapa_loss=0.0001739, whisper_loss=0.07495, over 18786.00 frames. ], tot_loss[loss=0.08617, beats_loss=0.00948, ecapa_loss=0.0001739, whisper_loss=0.07495, over 18786.00 frames. ], batch size: 77, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:36:34,078 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 17:37:09,454 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2495, over 931116.00 frames. 2024-08-20 17:37:31,785 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 17:39:14,505 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 17:39:14,508 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 17:39:19,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4889690.0, ans=0.1 2024-08-20 17:40:11,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4889890.0, ans=0.1 2024-08-20 17:40:20,958 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 17:40:43,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4889990.0, ans=0.0 2024-08-20 17:40:48,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-20 17:41:01,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4890090.0, ans=0.0 2024-08-20 17:41:04,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=12.0 2024-08-20 17:41:17,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4890190.0, ans=0.125 2024-08-20 17:41:19,464 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 50, loss[loss=0.1274, beats_loss=0.007635, ecapa_loss=0.0001267, whisper_loss=0.1185, over 17699.00 frames. ], tot_loss[loss=0.0977, beats_loss=0.009403, ecapa_loss=0.0001411, whisper_loss=0.08689, over 874332.87 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:41:20,260 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:41:20,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4890190.0, ans=0.2 2024-08-20 17:41:49,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4890290.0, ans=0.125 2024-08-20 17:41:49,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4890290.0, ans=0.125 2024-08-20 17:41:56,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4890290.0, ans=0.2 2024-08-20 17:42:33,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.20 vs. limit=6.0 2024-08-20 17:42:33,751 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.429e+01 2.698e+01 2.927e+01 5.810e+01, threshold=5.396e+01, percent-clipped=1.0 2024-08-20 17:43:06,849 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 17:43:10,444 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4890590.0, ans=0.125 2024-08-20 17:43:20,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4890590.0, ans=0.1 2024-08-20 17:43:23,664 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 100, loss[loss=0.1053, beats_loss=0.009815, ecapa_loss=0.0001525, whisper_loss=0.09393, over 21773.00 frames. ], tot_loss[loss=0.09881, beats_loss=0.009174, ecapa_loss=0.0001418, whisper_loss=0.08822, over 1507751.37 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:43:39,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=4890690.0, ans=15.0 2024-08-20 17:43:44,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4890690.0, ans=0.95 2024-08-20 17:43:51,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4890790.0, ans=0.0 2024-08-20 17:44:18,937 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 11 from Vox, 45 fro AS 2024-08-20 17:44:20,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4890890.0, ans=0.0 2024-08-20 17:44:34,322 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 17:44:41,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4890990.0, ans=0.0 2024-08-20 17:44:42,565 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 17:44:58,291 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 17:45:05,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4890990.0, ans=0.0 2024-08-20 17:45:30,981 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 150, loss[loss=0.1235, beats_loss=0.01013, ecapa_loss=0.0001196, whisper_loss=0.1121, over 23994.00 frames. ], tot_loss[loss=0.09915, beats_loss=0.009271, ecapa_loss=0.0001413, whisper_loss=0.08847, over 1998274.59 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:45:38,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4891190.0, ans=0.0 2024-08-20 17:45:42,052 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 17:46:35,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.414e+01 2.624e+01 2.993e+01 4.090e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-20 17:46:39,095 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-20 17:47:01,151 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-20 17:47:15,755 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 200, loss[loss=0.1277, beats_loss=0.007388, ecapa_loss=0.0001276, whisper_loss=0.1191, over 18789.00 frames. ], tot_loss[loss=0.09985, beats_loss=0.009536, ecapa_loss=0.0001396, whisper_loss=0.08892, over 2366487.61 frames. ], batch size: 69, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:47:19,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4891690.0, ans=0.125 2024-08-20 17:47:19,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4891690.0, ans=0.125 2024-08-20 17:47:38,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4891790.0, ans=0.0 2024-08-20 17:47:52,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4891790.0, ans=0.0 2024-08-20 17:47:53,813 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.627e+00 2024-08-20 17:47:58,045 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-20 17:48:02,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4891890.0, ans=0.0 2024-08-20 17:48:10,238 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=10.0 2024-08-20 17:48:26,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2024-08-20 17:48:50,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4892190.0, ans=0.125 2024-08-20 17:48:50,911 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 250, loss[loss=0.1066, beats_loss=0.01054, ecapa_loss=0.0001167, whisper_loss=0.09494, over 21712.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009873, ecapa_loss=0.0001373, whisper_loss=0.08917, over 2674804.31 frames. ], batch size: 86, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:49:03,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4892190.0, ans=0.125 2024-08-20 17:49:14,966 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 17:49:18,454 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 17:49:21,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4892290.0, ans=0.1 2024-08-20 17:49:32,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4892390.0, ans=0.125 2024-08-20 17:49:45,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4892390.0, ans=0.125 2024-08-20 17:49:48,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.222e+01 2.433e+01 2.766e+01 4.202e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-20 17:50:04,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=12.0 2024-08-20 17:50:05,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4892590.0, ans=0.125 2024-08-20 17:50:23,572 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 300, loss[loss=0.09204, beats_loss=0.01067, ecapa_loss=0.0001344, whisper_loss=0.08002, over 17400.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009976, ecapa_loss=0.0001387, whisper_loss=0.08997, over 2914646.05 frames. ], batch size: 68, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:50:37,451 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4892690.0, ans=0.1 2024-08-20 17:50:57,416 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 17:51:32,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4892990.0, ans=0.0 2024-08-20 17:51:33,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.09 vs. limit=6.0 2024-08-20 17:51:52,692 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-20 17:51:55,171 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 350, loss[loss=0.09673, beats_loss=0.009591, ecapa_loss=0.0001589, whisper_loss=0.08555, over 16495.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01017, ecapa_loss=0.0001375, whisper_loss=0.08919, over 3073997.77 frames. ], batch size: 64, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:52:04,189 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 13 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 17:52:08,245 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 17:52:13,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4893290.0, ans=0.125 2024-08-20 17:52:21,583 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 17:52:24,300 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4893290.0, ans=0.1 2024-08-20 17:52:34,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-20 17:52:37,386 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.60 vs. limit=10.0 2024-08-20 17:52:39,049 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 36 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 17:52:48,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.330e+01 2.558e+01 2.874e+01 1.855e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-20 17:52:49,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4893490.0, ans=0.2 2024-08-20 17:53:04,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4893490.0, ans=0.125 2024-08-20 17:53:15,400 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 17:53:26,061 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 400, loss[loss=0.0772, beats_loss=0.01065, ecapa_loss=0.000179, whisper_loss=0.06476, over 15689.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01025, ecapa_loss=0.0001386, whisper_loss=0.08848, over 3218345.14 frames. ], batch size: 62, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:53:28,209 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 18 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 17:53:37,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4893690.0, ans=0.1 2024-08-20 17:53:37,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4893690.0, ans=0.0 2024-08-20 17:53:45,924 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 17:53:48,050 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 17:53:50,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4893790.0, ans=0.0 2024-08-20 17:54:04,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4893890.0, ans=0.125 2024-08-20 17:54:26,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4893990.0, ans=0.2 2024-08-20 17:54:35,684 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2024-08-20 17:54:55,807 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 450, loss[loss=0.1131, beats_loss=0.00956, ecapa_loss=0.0001126, whisper_loss=0.1024, over 23714.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.0103, ecapa_loss=0.0001368, whisper_loss=0.08826, over 3388073.68 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:54:56,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4894190.0, ans=0.125 2024-08-20 17:55:00,488 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 32 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 17:55:01,805 WARNING [optim.py:496] (1/4) Scaling gradients by 0.014215901494026184, model_norm_threshold=51.16255569458008 2024-08-20 17:55:01,963 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.649e+06, grad_sumsq=5.014e+05, orig_rms_sq=3.288e+00 2024-08-20 17:55:15,186 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-20 17:55:18,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4894290.0, ans=0.125 2024-08-20 17:55:30,128 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 17:55:51,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.355e+01 2.533e+01 2.775e+01 3.599e+03, threshold=5.067e+01, percent-clipped=2.0 2024-08-20 17:56:08,142 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.64 vs. limit=10.0 2024-08-20 17:56:27,832 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 500, loss[loss=0.1022, beats_loss=0.0093, ecapa_loss=0.0001396, whisper_loss=0.09146, over 21093.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01023, ecapa_loss=0.0001376, whisper_loss=0.08816, over 3448438.13 frames. ], batch size: 82, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:56:36,180 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-20 17:56:48,295 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2024-08-20 17:57:02,342 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-20 17:57:17,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4894890.0, ans=0.125 2024-08-20 17:57:19,494 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 17:57:21,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4894990.0, ans=0.125 2024-08-20 17:57:35,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4894990.0, ans=0.125 2024-08-20 17:57:43,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=22.5 2024-08-20 17:57:44,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4895090.0, ans=0.125 2024-08-20 17:57:54,667 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 17:57:58,249 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 550, loss[loss=0.1235, beats_loss=0.00855, ecapa_loss=0.0001557, whisper_loss=0.1134, over 22214.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01026, ecapa_loss=0.0001377, whisper_loss=0.08841, over 3512573.04 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:58:12,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4895190.0, ans=0.125 2024-08-20 17:58:29,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4895290.0, ans=0.0 2024-08-20 17:58:31,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4895290.0, ans=0.0 2024-08-20 17:58:37,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4895390.0, ans=0.125 2024-08-20 17:58:48,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=12.0 2024-08-20 17:58:51,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4895490.0, ans=0.125 2024-08-20 17:58:52,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.257e+01 2.490e+01 2.718e+01 3.602e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 17:59:04,334 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=15.0 2024-08-20 17:59:15,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4895590.0, ans=0.0 2024-08-20 17:59:18,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4895590.0, ans=0.0 2024-08-20 17:59:20,059 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-20 17:59:22,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4895590.0, ans=0.0 2024-08-20 17:59:26,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4895690.0, ans=0.125 2024-08-20 17:59:26,130 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4895690.0, ans=0.125 2024-08-20 17:59:27,722 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 600, loss[loss=0.08632, beats_loss=0.01102, ecapa_loss=0.0001073, whisper_loss=0.07423, over 18912.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01023, ecapa_loss=0.0001372, whisper_loss=0.08872, over 3540305.89 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:59:38,099 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 17:59:52,351 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 18:00:10,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4895890.0, ans=0.125 2024-08-20 18:00:16,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-20 18:00:31,163 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 18:00:31,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4895990.0, ans=0.125 2024-08-20 18:00:33,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4895990.0, ans=0.125 2024-08-20 18:00:37,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2024-08-20 18:00:43,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4896090.0, ans=0.125 2024-08-20 18:00:56,853 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 650, loss[loss=0.1031, beats_loss=0.009608, ecapa_loss=0.000129, whisper_loss=0.09224, over 19229.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01027, ecapa_loss=0.0001365, whisper_loss=0.08917, over 3613831.87 frames. ], batch size: 74, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:01:04,657 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 18:01:16,701 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-08-20 18:01:21,640 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 18:01:35,627 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 18:01:39,029 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 18:01:40,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4896390.0, ans=0.125 2024-08-20 18:01:51,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.174e+01 2.397e+01 2.738e+01 4.303e+01, threshold=4.793e+01, percent-clipped=0.0 2024-08-20 18:02:02,873 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4896490.0, ans=0.125 2024-08-20 18:02:13,078 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 18:02:21,951 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 18:02:26,543 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 700, loss[loss=0.101, beats_loss=0.009894, ecapa_loss=0.0001307, whisper_loss=0.08981, over 18099.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01033, ecapa_loss=0.0001367, whisper_loss=0.08896, over 3665003.91 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:02:30,712 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 18:02:34,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4896690.0, ans=0.07 2024-08-20 18:02:40,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4896690.0, ans=0.125 2024-08-20 18:03:01,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4896890.0, ans=0.0 2024-08-20 18:03:36,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4896990.0, ans=0.125 2024-08-20 18:03:39,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4897090.0, ans=0.1 2024-08-20 18:03:57,679 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 750, loss[loss=0.08539, beats_loss=0.008684, ecapa_loss=0.0001588, whisper_loss=0.07512, over 15327.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01022, ecapa_loss=0.0001363, whisper_loss=0.08923, over 3666317.25 frames. ], batch size: 60, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:03:59,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4897190.0, ans=0.125 2024-08-20 18:03:59,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4897190.0, ans=0.125 2024-08-20 18:04:10,599 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 18:04:46,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4897390.0, ans=0.1 2024-08-20 18:04:49,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4897490.0, ans=0.1 2024-08-20 18:04:50,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.248e+01 2.436e+01 2.663e+01 4.558e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-20 18:04:53,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4897490.0, ans=0.125 2024-08-20 18:05:00,133 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 18:05:12,101 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 18:05:15,603 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 13 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 18:05:25,688 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 800, loss[loss=0.1087, beats_loss=0.009439, ecapa_loss=0.0001382, whisper_loss=0.09786, over 20583.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01027, ecapa_loss=0.0001366, whisper_loss=0.08866, over 3713337.64 frames. ], batch size: 81, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:05:36,390 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 32 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 18:05:38,181 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 31 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 18:05:42,629 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2024-08-20 18:05:47,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-20 18:06:09,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4897890.0, ans=0.0 2024-08-20 18:06:11,319 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 18:06:16,524 WARNING [optim.py:496] (1/4) Scaling gradients by 0.034612394869327545, model_norm_threshold=48.72909927368164 2024-08-20 18:06:16,940 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.559e+05, grad_sumsq=3.559e+05, orig_rms_sq=1.000e+00 2024-08-20 18:06:18,287 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 18:06:39,768 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 18:06:39,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4898090.0, ans=0.125 2024-08-20 18:06:43,834 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-20 18:06:52,580 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 850, loss[loss=0.08347, beats_loss=0.01192, ecapa_loss=0.0001218, whisper_loss=0.07033, over 16600.00 frames. ], tot_loss[loss=0.09944, beats_loss=0.01028, ecapa_loss=0.000136, whisper_loss=0.0878, over 3664574.06 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:06:59,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4898190.0, ans=0.035 2024-08-20 18:07:45,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.491e+01 2.263e+01 2.466e+01 2.785e+01 1.408e+03, threshold=4.933e+01, percent-clipped=1.0 2024-08-20 18:08:22,070 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 900, loss[loss=0.1011, beats_loss=0.008191, ecapa_loss=0.0001187, whisper_loss=0.09173, over 15943.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01017, ecapa_loss=0.0001375, whisper_loss=0.08873, over 3671726.30 frames. ], batch size: 60, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:08:24,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4898690.0, ans=0.1 2024-08-20 18:08:33,454 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 18:08:52,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4898790.0, ans=10.0 2024-08-20 18:09:10,224 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4898890.0, ans=0.125 2024-08-20 18:09:11,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4898890.0, ans=0.2 2024-08-20 18:09:18,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4898990.0, ans=0.2 2024-08-20 18:09:39,573 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-20 18:09:40,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4899090.0, ans=0.125 2024-08-20 18:09:51,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4899190.0, ans=0.1 2024-08-20 18:09:52,095 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 950, loss[loss=0.09823, beats_loss=0.009891, ecapa_loss=0.0001523, whisper_loss=0.08681, over 22425.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01021, ecapa_loss=0.0001367, whisper_loss=0.08912, over 3716897.20 frames. ], batch size: 92, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:09:55,121 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 22 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 18:09:56,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4899190.0, ans=0.125 2024-08-20 18:10:14,557 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-20 18:10:21,361 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-20 18:10:27,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4899390.0, ans=0.0 2024-08-20 18:10:33,875 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 18:10:36,434 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2024-08-20 18:10:40,104 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03566131740808487, model_norm_threshold=49.32598114013672 2024-08-20 18:10:40,263 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.124e+05, grad_sumsq=3.124e+05, orig_rms_sq=1.000e+00 2024-08-20 18:10:43,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.227e+01 2.460e+01 2.712e+01 1.383e+03, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 18:10:55,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4899490.0, ans=0.1 2024-08-20 18:10:58,046 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 18:11:20,422 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1000, loss[loss=0.1252, beats_loss=0.006814, ecapa_loss=0.0001378, whisper_loss=0.1171, over 18160.00 frames. ], tot_loss[loss=0.09932, beats_loss=0.01035, ecapa_loss=0.0001368, whisper_loss=0.0876, over 3697887.15 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:11:25,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-20 18:11:38,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4899790.0, ans=15.0 2024-08-20 18:11:38,985 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 18:11:42,617 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 19 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 18:11:47,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4899790.0, ans=0.5 2024-08-20 18:12:01,885 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 18:12:07,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4899890.0, ans=0.125 2024-08-20 18:12:24,447 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2024-08-20 18:12:50,724 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1050, loss[loss=0.07659, beats_loss=0.01116, ecapa_loss=0.0001287, whisper_loss=0.06415, over 14365.00 frames. ], tot_loss[loss=0.09942, beats_loss=0.0103, ecapa_loss=0.0001369, whisper_loss=0.08775, over 3705039.31 frames. ], batch size: 60, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:12:56,030 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 18:13:09,362 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2024-08-20 18:13:35,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4900390.0, ans=0.95 2024-08-20 18:13:43,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.230e+01 2.407e+01 2.713e+01 3.528e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-20 18:13:47,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4900490.0, ans=0.5 2024-08-20 18:13:57,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4900490.0, ans=0.0 2024-08-20 18:14:06,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-20 18:14:17,681 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1100, loss[loss=0.08681, beats_loss=0.01285, ecapa_loss=0.0001238, whisper_loss=0.07272, over 17331.00 frames. ], tot_loss[loss=0.099, beats_loss=0.01034, ecapa_loss=0.0001362, whisper_loss=0.08729, over 3663618.90 frames. ], batch size: 69, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:14:22,324 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-20 18:14:25,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4900690.0, ans=0.0 2024-08-20 18:14:27,725 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=12.0 2024-08-20 18:14:33,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4900690.0, ans=0.125 2024-08-20 18:14:34,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4900790.0, ans=0.125 2024-08-20 18:14:36,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4900790.0, ans=0.1 2024-08-20 18:14:52,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4900890.0, ans=0.0 2024-08-20 18:14:54,940 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 20 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-20 18:14:57,025 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 13 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 18:14:58,676 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 24 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 18:15:22,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4900990.0, ans=0.0 2024-08-20 18:15:27,457 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 13 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 18:15:44,764 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1150, loss[loss=0.07358, beats_loss=0.009994, ecapa_loss=0.0001477, whisper_loss=0.06211, over 14346.00 frames. ], tot_loss[loss=0.09919, beats_loss=0.0103, ecapa_loss=0.0001362, whisper_loss=0.08753, over 3640917.99 frames. ], batch size: 57, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:15:44,993 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 18:16:27,882 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 34 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 18:16:36,632 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 18:16:38,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.288e+01 2.529e+01 2.855e+01 5.753e+01, threshold=5.059e+01, percent-clipped=2.0 2024-08-20 18:16:48,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4901490.0, ans=0.125 2024-08-20 18:16:56,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4901590.0, ans=0.0 2024-08-20 18:17:14,344 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1200, loss[loss=0.1029, beats_loss=0.01024, ecapa_loss=0.0001221, whisper_loss=0.09149, over 19719.00 frames. ], tot_loss[loss=0.09929, beats_loss=0.01029, ecapa_loss=0.0001362, whisper_loss=0.08764, over 3653682.27 frames. ], batch size: 76, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:17:22,943 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 18:17:27,272 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2024-08-20 18:18:19,251 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:18:26,729 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 18:18:37,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4902090.0, ans=0.125 2024-08-20 18:18:39,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-08-20 18:18:43,503 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1250, loss[loss=0.1042, beats_loss=0.01132, ecapa_loss=0.0001332, whisper_loss=0.09155, over 21809.00 frames. ], tot_loss[loss=0.0995, beats_loss=0.01034, ecapa_loss=0.0001375, whisper_loss=0.08778, over 3684162.33 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:18:47,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4902190.0, ans=0.125 2024-08-20 18:18:54,197 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 15 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 18:19:14,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-20 18:19:16,844 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 18:19:35,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.263e+01 2.557e+01 2.836e+01 4.039e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-20 18:19:58,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4902590.0, ans=0.125 2024-08-20 18:19:58,942 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-20 18:20:11,663 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1300, loss[loss=0.1258, beats_loss=0.009097, ecapa_loss=0.0001529, whisper_loss=0.1152, over 13747.00 frames. ], tot_loss[loss=0.09956, beats_loss=0.01038, ecapa_loss=0.0001365, whisper_loss=0.08782, over 3692425.14 frames. ], batch size: 54, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:20:17,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4902690.0, ans=0.125 2024-08-20 18:20:33,832 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 18:20:35,446 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 18:20:38,785 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 18:20:59,623 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 18:21:02,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4902890.0, ans=22.5 2024-08-20 18:21:09,865 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 18:21:17,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4902990.0, ans=0.0 2024-08-20 18:21:41,563 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1350, loss[loss=0.1092, beats_loss=0.009725, ecapa_loss=0.0001594, whisper_loss=0.0979, over 21178.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01033, ecapa_loss=0.0001367, whisper_loss=0.08851, over 3724699.51 frames. ], batch size: 85, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:22:15,402 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2024-08-20 18:22:21,160 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4903390.0, ans=0.0 2024-08-20 18:22:25,890 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 18:22:34,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.265e+01 2.449e+01 2.794e+01 7.955e+01, threshold=4.899e+01, percent-clipped=1.0 2024-08-20 18:23:10,310 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1400, loss[loss=0.1011, beats_loss=0.007845, ecapa_loss=0.000144, whisper_loss=0.09185, over 16825.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01026, ecapa_loss=0.0001375, whisper_loss=0.08896, over 3699686.53 frames. ], batch size: 64, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:23:12,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-08-20 18:23:28,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4903790.0, ans=0.2 2024-08-20 18:23:38,933 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 18:23:56,222 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 18:23:59,893 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 30 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-20 18:24:09,110 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 18:24:12,630 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 30 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 18:24:38,115 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1450, loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001252, whisper_loss=0.09236, over 23328.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0102, ecapa_loss=0.0001375, whisper_loss=0.08904, over 3703316.78 frames. ], batch size: 91, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:24:51,830 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 18:24:53,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4904190.0, ans=0.125 2024-08-20 18:25:13,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4904390.0, ans=0.125 2024-08-20 18:25:17,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4904390.0, ans=0.1 2024-08-20 18:25:22,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2024-08-20 18:25:32,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.130e+01 2.447e+01 2.778e+01 4.776e+01, threshold=4.894e+01, percent-clipped=0.0 2024-08-20 18:25:36,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4904490.0, ans=0.1 2024-08-20 18:26:10,675 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 18:26:18,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4904590.0, ans=0.0 2024-08-20 18:26:31,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4904690.0, ans=0.0 2024-08-20 18:26:32,481 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1500, loss[loss=0.07443, beats_loss=0.01088, ecapa_loss=0.0001413, whisper_loss=0.06214, over 14624.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01028, ecapa_loss=0.0001353, whisper_loss=0.08854, over 3699245.89 frames. ], batch size: 59, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:26:37,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4904690.0, ans=0.125 2024-08-20 18:26:37,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-20 18:26:38,821 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 18:26:51,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4904790.0, ans=0.125 2024-08-20 18:26:57,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4904790.0, ans=0.125 2024-08-20 18:27:04,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-20 18:27:04,970 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2024-08-20 18:27:07,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4904790.0, ans=0.1 2024-08-20 18:27:18,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2024-08-20 18:27:20,087 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-20 18:27:34,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4904990.0, ans=0.0 2024-08-20 18:27:38,329 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4904990.0, ans=0.0 2024-08-20 18:27:44,879 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 18:28:04,928 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1550, loss[loss=0.09116, beats_loss=0.01326, ecapa_loss=0.0001486, whisper_loss=0.07641, over 16878.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01023, ecapa_loss=0.0001366, whisper_loss=0.08818, over 3686911.73 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:28:11,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=4905190.0, ans=0.1 2024-08-20 18:28:18,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4905190.0, ans=0.1 2024-08-20 18:28:34,788 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 18:28:51,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4905390.0, ans=0.0 2024-08-20 18:28:56,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4905390.0, ans=0.0 2024-08-20 18:29:01,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.222e+01 2.378e+01 2.673e+01 8.948e+01, threshold=4.757e+01, percent-clipped=1.0 2024-08-20 18:29:15,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-20 18:29:26,034 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 18:29:37,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4905690.0, ans=0.125 2024-08-20 18:29:38,722 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1600, loss[loss=0.09236, beats_loss=0.01211, ecapa_loss=0.0001104, whisper_loss=0.07914, over 18913.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01019, ecapa_loss=0.0001358, whisper_loss=0.08856, over 3663993.27 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:29:49,998 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 18:29:57,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4905790.0, ans=0.125 2024-08-20 18:30:16,251 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 18:30:16,960 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-08-20 18:30:18,110 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 18:30:33,979 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0239420123398304, model_norm_threshold=47.56806564331055 2024-08-20 18:30:34,138 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.639e+05, grad_sumsq=5.639e+05, orig_rms_sq=1.000e+00 2024-08-20 18:30:48,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4905990.0, ans=0.1 2024-08-20 18:30:49,303 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 18:30:56,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4906090.0, ans=0.125 2024-08-20 18:30:56,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4906090.0, ans=0.2 2024-08-20 18:31:02,015 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05985227972269058, model_norm_threshold=47.56806564331055 2024-08-20 18:31:02,177 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.784e+04, grad_sumsq=6.784e+04, orig_rms_sq=1.000e+00 2024-08-20 18:31:10,630 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1650, loss[loss=0.08652, beats_loss=0.009645, ecapa_loss=0.0001622, whisper_loss=0.07525, over 20029.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01017, ecapa_loss=0.0001366, whisper_loss=0.08895, over 3725754.10 frames. ], batch size: 83, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:31:16,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4906190.0, ans=0.0 2024-08-20 18:31:22,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4906190.0, ans=0.125 2024-08-20 18:31:24,077 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 18:31:41,632 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 18:31:43,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-20 18:31:46,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4906390.0, ans=0.2 2024-08-20 18:31:59,502 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:32:03,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4906490.0, ans=0.125 2024-08-20 18:32:04,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.417e+01 2.742e+01 3.192e+01 1.987e+03, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 18:32:04,750 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 11 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 18:32:08,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4906490.0, ans=0.2 2024-08-20 18:32:12,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4906490.0, ans=0.125 2024-08-20 18:32:34,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2024-08-20 18:32:36,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4906590.0, ans=0.125 2024-08-20 18:32:36,848 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-08-20 18:32:39,586 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1700, loss[loss=0.09835, beats_loss=0.01143, ecapa_loss=0.0001541, whisper_loss=0.08538, over 19771.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01016, ecapa_loss=0.0001372, whisper_loss=0.08947, over 3750799.45 frames. ], batch size: 81, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:33:22,490 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-20 18:33:52,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4907090.0, ans=0.125 2024-08-20 18:34:01,101 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 23 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 18:34:10,115 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 18:34:11,703 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1750, loss[loss=0.1015, beats_loss=0.01071, ecapa_loss=0.0001322, whisper_loss=0.08946, over 20751.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01008, ecapa_loss=0.0001379, whisper_loss=0.08957, over 3738463.56 frames. ], batch size: 80, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:34:11,952 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 22 from LS+wenet, 8 from Vox, 22 fro AS 2024-08-20 18:34:19,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4907190.0, ans=15.0 2024-08-20 18:34:39,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4907290.0, ans=0.125 2024-08-20 18:34:43,506 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 18:34:53,480 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 18:34:56,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2024-08-20 18:35:05,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.254e+01 2.590e+01 2.933e+01 3.656e+02, threshold=5.181e+01, percent-clipped=1.0 2024-08-20 18:35:11,005 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 26 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 18:35:11,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4907490.0, ans=0.0 2024-08-20 18:35:40,827 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1800, loss[loss=0.115, beats_loss=0.008102, ecapa_loss=0.0001403, whisper_loss=0.1055, over 18970.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01017, ecapa_loss=0.0001373, whisper_loss=0.08963, over 3752300.60 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:36:07,911 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 18:36:08,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-20 18:36:11,504 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 12 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 18:36:11,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4907790.0, ans=0.2 2024-08-20 18:36:19,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4907890.0, ans=0.025 2024-08-20 18:36:34,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4907990.0, ans=0.0 2024-08-20 18:36:42,975 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-08-20 18:36:49,305 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 11 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 18:37:09,307 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1850, loss[loss=0.1036, beats_loss=0.01115, ecapa_loss=0.0001443, whisper_loss=0.09099, over 22129.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01022, ecapa_loss=0.0001366, whisper_loss=0.08957, over 3721692.91 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:37:09,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4908190.0, ans=0.0 2024-08-20 18:37:34,537 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 41 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-20 18:37:38,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4908290.0, ans=0.125 2024-08-20 18:37:40,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4908290.0, ans=0.125 2024-08-20 18:37:41,362 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 18:37:43,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4908390.0, ans=0.125 2024-08-20 18:37:50,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-08-20 18:38:04,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.237e+01 2.438e+01 2.771e+01 3.802e+01, threshold=4.876e+01, percent-clipped=0.0 2024-08-20 18:38:07,589 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-20 18:38:13,320 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 18:38:26,541 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 18:38:32,426 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 18:38:39,930 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 18:38:43,210 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1900, loss[loss=0.1018, beats_loss=0.007665, ecapa_loss=0.0001641, whisper_loss=0.0925, over 17711.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01021, ecapa_loss=0.0001353, whisper_loss=0.09027, over 3750444.65 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:38:45,466 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 27 from LS+wenet, 14 from Vox, 55 fro AS 2024-08-20 18:38:48,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4908690.0, ans=0.04949747468305833 2024-08-20 18:39:01,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4908790.0, ans=0.125 2024-08-20 18:39:10,115 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 18:39:13,655 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 18:39:14,040 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4908790.0, ans=0.0 2024-08-20 18:39:35,778 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 18:39:41,529 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 18:40:05,961 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 18:40:18,052 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 1950, loss[loss=0.1158, beats_loss=0.01015, ecapa_loss=9.958e-05, whisper_loss=0.1046, over 23506.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01024, ecapa_loss=0.0001352, whisper_loss=0.08985, over 3748889.62 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:40:39,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4909290.0, ans=0.125 2024-08-20 18:40:48,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4909290.0, ans=0.0 2024-08-20 18:40:53,629 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 20 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 18:40:59,993 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 31 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 18:41:06,081 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 18:41:14,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.290e+01 2.558e+01 2.755e+01 1.117e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-20 18:41:21,066 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4909490.0, ans=0.1 2024-08-20 18:41:31,487 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 18:41:33,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4909590.0, ans=0.2 2024-08-20 18:41:33,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4909590.0, ans=0.1 2024-08-20 18:41:46,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4909590.0, ans=0.1 2024-08-20 18:41:50,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.37 vs. limit=10.0 2024-08-20 18:41:50,630 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2000, loss[loss=0.0928, beats_loss=0.008584, ecapa_loss=0.0001517, whisper_loss=0.0827, over 13594.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01026, ecapa_loss=0.0001356, whisper_loss=0.08954, over 3737405.51 frames. ], batch size: 53, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:42:29,377 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 18:42:32,883 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4909890.0, ans=0.125 2024-08-20 18:43:14,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-20 18:43:20,486 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2050, loss[loss=0.1034, beats_loss=0.01227, ecapa_loss=0.0001305, whisper_loss=0.08984, over 22810.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001356, whisper_loss=0.08985, over 3718949.00 frames. ], batch size: 92, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:43:24,425 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4910190.0, ans=0.07 2024-08-20 18:43:29,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4910190.0, ans=0.125 2024-08-20 18:43:30,956 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 18:43:31,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4910190.0, ans=0.1 2024-08-20 18:43:50,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4910290.0, ans=0.125 2024-08-20 18:44:03,070 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 38 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 18:44:14,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.216e+01 2.451e+01 2.843e+01 3.787e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-20 18:44:28,722 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-08-20 18:44:31,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4910590.0, ans=0.2 2024-08-20 18:44:48,998 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2100, loss[loss=0.1215, beats_loss=0.009476, ecapa_loss=0.0001375, whisper_loss=0.1106, over 21127.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01028, ecapa_loss=0.0001345, whisper_loss=0.08985, over 3735885.70 frames. ], batch size: 83, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:45:00,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4910690.0, ans=0.125 2024-08-20 18:45:39,552 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 18:45:57,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4910990.0, ans=0.125 2024-08-20 18:46:10,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-20 18:46:16,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4911190.0, ans=0.125 2024-08-20 18:46:17,946 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2150, loss[loss=0.09072, beats_loss=0.008428, ecapa_loss=0.0001686, whisper_loss=0.08061, over 18021.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01036, ecapa_loss=0.0001331, whisper_loss=0.08913, over 3733108.54 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:46:44,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4911290.0, ans=0.2 2024-08-20 18:47:09,446 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05905143544077873, model_norm_threshold=49.024410247802734 2024-08-20 18:47:09,608 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.632e+04, grad_sumsq=6.177e+06, orig_rms_sq=1.074e-02 2024-08-20 18:47:12,872 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.265e+01 2.540e+01 2.946e+01 8.302e+02, threshold=5.079e+01, percent-clipped=3.0 2024-08-20 18:47:14,922 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 18:47:23,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4911490.0, ans=0.0 2024-08-20 18:47:29,644 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-20 18:47:31,534 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-20 18:47:31,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4911590.0, ans=0.125 2024-08-20 18:47:36,397 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 18:47:46,322 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2200, loss[loss=0.1028, beats_loss=0.0112, ecapa_loss=0.0001058, whisper_loss=0.09052, over 22769.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01052, ecapa_loss=0.000132, whisper_loss=0.08875, over 3768696.52 frames. ], batch size: 85, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:47:49,617 WARNING [optim.py:496] (1/4) Scaling gradients by 0.06644881516695023, model_norm_threshold=50.791358947753906 2024-08-20 18:47:49,775 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.846e+04, grad_sumsq=7.846e+04, orig_rms_sq=1.000e+00 2024-08-20 18:47:52,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4911690.0, ans=0.1 2024-08-20 18:48:09,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4911790.0, ans=0.125 2024-08-20 18:48:16,860 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-20 18:48:28,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4911890.0, ans=0.125 2024-08-20 18:48:34,745 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 18:48:42,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4911990.0, ans=0.125 2024-08-20 18:49:17,689 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2250, loss[loss=0.1043, beats_loss=0.01007, ecapa_loss=0.0001144, whisper_loss=0.09311, over 23431.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001335, whisper_loss=0.08929, over 3744540.14 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:49:59,934 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 37 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 18:50:08,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4912390.0, ans=0.1 2024-08-20 18:50:14,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.207e+01 2.453e+01 2.665e+01 7.644e+02, threshold=4.907e+01, percent-clipped=1.0 2024-08-20 18:50:16,721 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-20 18:50:43,115 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 25 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-20 18:50:48,184 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2300, loss[loss=0.07442, beats_loss=0.01061, ecapa_loss=0.0001244, whisper_loss=0.06256, over 16640.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001342, whisper_loss=0.09007, over 3789200.73 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:50:48,399 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 20 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-20 18:51:01,556 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-20 18:51:15,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4912790.0, ans=0.1 2024-08-20 18:52:09,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4913090.0, ans=0.0 2024-08-20 18:52:17,057 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2350, loss[loss=0.07928, beats_loss=0.01062, ecapa_loss=0.0001154, whisper_loss=0.0675, over 15032.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001361, whisper_loss=0.0908, over 3822385.16 frames. ], batch size: 58, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:52:28,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4913190.0, ans=0.125 2024-08-20 18:52:38,710 WARNING [optim.py:496] (1/4) Scaling gradients by 0.04937309771776199, model_norm_threshold=49.067115783691406 2024-08-20 18:52:38,868 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.547e+05, grad_sumsq=4.707e+04, orig_rms_sq=3.286e+00 2024-08-20 18:52:39,099 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 18:52:42,991 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-20 18:52:46,500 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 18:52:47,831 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 18:53:05,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-08-20 18:53:08,300 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 18 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-20 18:53:13,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4913490.0, ans=0.125 2024-08-20 18:53:14,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.360e+01 2.620e+01 2.900e+01 9.938e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-20 18:53:36,683 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 18:53:37,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-20 18:53:38,549 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.349e+00 2024-08-20 18:53:39,946 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 17 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 18:53:40,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4913590.0, ans=0.125 2024-08-20 18:53:45,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4913590.0, ans=0.125 2024-08-20 18:53:49,349 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2400, loss[loss=0.1095, beats_loss=0.01209, ecapa_loss=0.0001362, whisper_loss=0.09608, over 16404.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001368, whisper_loss=0.09027, over 3805924.73 frames. ], batch size: 63, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:53:51,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4913690.0, ans=0.125 2024-08-20 18:53:52,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4913690.0, ans=0.0 2024-08-20 18:54:03,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4913690.0, ans=0.2 2024-08-20 18:54:06,466 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 18:54:08,838 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-20 18:54:22,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4913890.0, ans=0.125 2024-08-20 18:54:23,676 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 18:54:23,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4913890.0, ans=10.0 2024-08-20 18:54:25,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4913890.0, ans=0.04949747468305833 2024-08-20 18:54:27,384 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 17 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-20 18:54:34,470 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-20 18:54:41,892 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 18:54:44,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4913990.0, ans=0.125 2024-08-20 18:54:51,057 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 18:55:13,133 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 18:55:15,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4914090.0, ans=0.125 2024-08-20 18:55:19,312 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2450, loss[loss=0.0952, beats_loss=0.008769, ecapa_loss=0.0001657, whisper_loss=0.08478, over 18235.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001373, whisper_loss=0.08974, over 3813246.87 frames. ], batch size: 76, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:55:29,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4914190.0, ans=0.125 2024-08-20 18:55:34,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4914190.0, ans=0.0 2024-08-20 18:55:34,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4914190.0, ans=0.0 2024-08-20 18:56:01,497 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 18:56:09,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4914390.0, ans=0.125 2024-08-20 18:56:16,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.269e+01 2.495e+01 2.810e+01 4.376e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 18:56:53,400 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2500, loss[loss=0.08814, beats_loss=0.01059, ecapa_loss=0.0001529, whisper_loss=0.07602, over 18668.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001377, whisper_loss=0.08962, over 3791086.02 frames. ], batch size: 79, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:56:55,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4914690.0, ans=0.0 2024-08-20 18:57:02,829 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-20 18:57:08,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4914690.0, ans=0.0 2024-08-20 18:57:10,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4914790.0, ans=0.125 2024-08-20 18:57:20,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4914790.0, ans=0.125 2024-08-20 18:57:27,369 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 15 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 18:57:41,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4914890.0, ans=0.1 2024-08-20 18:58:03,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4915090.0, ans=0.125 2024-08-20 18:58:17,044 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4915090.0, ans=0.125 2024-08-20 18:58:21,725 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2550, loss[loss=0.1048, beats_loss=0.009201, ecapa_loss=0.0001568, whisper_loss=0.09401, over 21628.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001376, whisper_loss=0.08967, over 3799925.06 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:58:22,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4915190.0, ans=0.125 2024-08-20 18:58:23,373 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 11 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 18:58:31,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4915190.0, ans=0.125 2024-08-20 18:58:48,677 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 18:58:49,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4915290.0, ans=0.125 2024-08-20 18:58:51,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4915290.0, ans=0.125 2024-08-20 18:58:52,809 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:58:58,396 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4915390.0, ans=0.125 2024-08-20 18:59:00,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4915390.0, ans=0.1 2024-08-20 18:59:10,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4915390.0, ans=0.0 2024-08-20 18:59:16,986 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 18:59:19,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.358e+01 2.574e+01 2.752e+01 5.119e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-20 18:59:19,488 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 18:59:28,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4915490.0, ans=0.09899494936611666 2024-08-20 18:59:55,381 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2600, loss[loss=0.1003, beats_loss=0.01279, ecapa_loss=0.000113, whisper_loss=0.08642, over 20409.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001382, whisper_loss=0.08988, over 3819910.48 frames. ], batch size: 82, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:00:24,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2024-08-20 19:00:31,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4915890.0, ans=0.125 2024-08-20 19:00:38,855 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-20 19:00:49,616 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 19:01:05,015 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 19:01:05,246 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4915990.0, ans=0.0 2024-08-20 19:01:09,077 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 19:01:20,984 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 19:01:26,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-20 19:01:30,872 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2650, loss[loss=0.09448, beats_loss=0.0118, ecapa_loss=0.0001187, whisper_loss=0.0815, over 16056.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001381, whisper_loss=0.0891, over 3815275.39 frames. ], batch size: 62, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:01:48,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4916290.0, ans=0.1 2024-08-20 19:01:52,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4916290.0, ans=10.0 2024-08-20 19:01:52,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4916290.0, ans=0.04949747468305833 2024-08-20 19:02:02,524 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0383872389793396, model_norm_threshold=51.48301696777344 2024-08-20 19:02:02,680 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.236e+05, grad_sumsq=2.236e+05, orig_rms_sq=1.000e+00 2024-08-20 19:02:02,967 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 21 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-20 19:02:17,381 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4916390.0, ans=0.125 2024-08-20 19:02:25,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.307e+01 2.525e+01 3.012e+01 1.341e+03, threshold=5.051e+01, percent-clipped=2.0 2024-08-20 19:02:26,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4916490.0, ans=0.125 2024-08-20 19:02:33,182 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 19:02:44,030 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 24 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-20 19:02:59,056 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2700, loss[loss=0.08562, beats_loss=0.01213, ecapa_loss=9.997e-05, whisper_loss=0.07249, over 18258.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01046, ecapa_loss=0.0001379, whisper_loss=0.0887, over 3810159.33 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:03:11,654 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 21 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 19:03:27,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4916790.0, ans=0.0 2024-08-20 19:04:04,413 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 19:04:08,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4917090.0, ans=0.125 2024-08-20 19:04:18,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4917090.0, ans=0.0 2024-08-20 19:04:24,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2024-08-20 19:04:26,609 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 19:04:27,974 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2750, loss[loss=0.1176, beats_loss=0.008808, ecapa_loss=0.0001366, whisper_loss=0.1075, over 23011.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01047, ecapa_loss=0.0001377, whisper_loss=0.08813, over 3813292.22 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:04:37,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4917190.0, ans=0.1 2024-08-20 19:05:03,592 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 19:05:09,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4917390.0, ans=0.1 2024-08-20 19:05:16,707 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 19:05:19,287 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4917390.0, ans=0.125 2024-08-20 19:05:28,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.332e+01 2.555e+01 2.897e+01 4.432e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 19:05:34,635 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-08-20 19:05:41,130 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 19:05:53,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4917590.0, ans=0.125 2024-08-20 19:05:55,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4917590.0, ans=0.2 2024-08-20 19:06:04,498 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2800, loss[loss=0.112, beats_loss=0.007092, ecapa_loss=0.0001466, whisper_loss=0.1034, over 18003.00 frames. ], tot_loss[loss=0.09992, beats_loss=0.01049, ecapa_loss=0.0001382, whisper_loss=0.08804, over 3820450.91 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:06:13,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=12.0 2024-08-20 19:06:13,417 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-20 19:06:16,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4917690.0, ans=0.0 2024-08-20 19:06:23,500 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 19:06:25,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4917790.0, ans=0.0 2024-08-20 19:06:40,354 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:06:40,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=12.0 2024-08-20 19:06:55,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4917890.0, ans=0.0 2024-08-20 19:06:57,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=22.5 2024-08-20 19:06:58,746 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-20 19:07:11,187 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-20 19:07:22,425 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 23 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-20 19:07:22,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-20 19:07:32,812 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2850, loss[loss=0.1029, beats_loss=0.009169, ecapa_loss=0.0001371, whisper_loss=0.09234, over 15370.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.08871, over 3838591.35 frames. ], batch size: 58, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:08:14,922 INFO [train_multi_KD3.py:845] (1/4) A total of 97 cuts. 21 from LS+wenet, 31 from Vox, 45 fro AS 2024-08-20 19:08:29,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.343e+01 2.572e+01 2.859e+01 3.545e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-20 19:08:33,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4918490.0, ans=0.125 2024-08-20 19:09:02,698 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=22.5 2024-08-20 19:09:03,471 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2900, loss[loss=0.1011, beats_loss=0.008727, ecapa_loss=0.0001538, whisper_loss=0.09085, over 20133.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0104, ecapa_loss=0.0001385, whisper_loss=0.0886, over 3835084.14 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:09:03,696 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 19:09:11,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4918690.0, ans=0.125 2024-08-20 19:09:20,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:22,306 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-20 19:09:25,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:34,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:35,816 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-20 19:09:46,827 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 19:09:48,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4918890.0, ans=0.125 2024-08-20 19:10:08,627 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.72 vs. limit=5.0 2024-08-20 19:10:32,715 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 2950, loss[loss=0.09369, beats_loss=0.01051, ecapa_loss=0.0001741, whisper_loss=0.08144, over 18981.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.0891, over 3819010.10 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:10:38,686 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4919190.0, ans=0.0 2024-08-20 19:11:29,427 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 19:11:30,767 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.232e+01 2.550e+01 2.898e+01 2.799e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-20 19:11:50,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=12.0 2024-08-20 19:11:51,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4919590.0, ans=0.125 2024-08-20 19:12:00,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4919590.0, ans=0.125 2024-08-20 19:12:03,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2024-08-20 19:12:05,890 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3000, loss[loss=0.07675, beats_loss=0.01235, ecapa_loss=0.00011, whisper_loss=0.0633, over 16442.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.08868, over 3808704.12 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:12:05,890 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 19:12:42,545 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000513, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 19:13:06,130 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on SV_voxceleb1: loss=0.003961, beats_loss=0, ecapa_loss=0.0003961, whisper_loss=0, over 944235.00 frames. 2024-08-20 19:13:54,147 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9137, 1.6003, 1.9889, 0.9277], device='cuda:1') 2024-08-20 19:14:44,838 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 19:14:44,842 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 19:15:20,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4919890.0, ans=0.125 2024-08-20 19:15:41,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4919990.0, ans=0.125 2024-08-20 19:16:04,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4920090.0, ans=10.0 2024-08-20 19:16:11,619 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 19:16:15,031 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3050, loss[loss=0.1029, beats_loss=0.009434, ecapa_loss=0.0001425, whisper_loss=0.09206, over 22116.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0104, ecapa_loss=0.0001391, whisper_loss=0.08878, over 3811752.31 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:16:26,904 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 19:16:27,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4920190.0, ans=0.04949747468305833 2024-08-20 19:17:08,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4920490.0, ans=0.125 2024-08-20 19:17:08,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4920490.0, ans=0.125 2024-08-20 19:17:08,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-20 19:17:09,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.284e+01 2.588e+01 2.897e+01 2.080e+02, threshold=5.176e+01, percent-clipped=1.0 2024-08-20 19:17:16,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4920490.0, ans=0.125 2024-08-20 19:17:41,203 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3100, loss[loss=0.07729, beats_loss=0.01039, ecapa_loss=0.0001325, whisper_loss=0.06557, over 14245.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001385, whisper_loss=0.09045, over 3850705.33 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:17:46,779 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-20 19:17:56,917 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 30 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 19:17:57,516 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2024-08-20 19:18:14,905 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 19:18:19,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4920890.0, ans=0.0 2024-08-20 19:18:26,363 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 19:19:11,049 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3150, loss[loss=0.114, beats_loss=0.00756, ecapa_loss=0.0001631, whisper_loss=0.1048, over 15463.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001388, whisper_loss=0.09051, over 3841639.92 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:19:23,563 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-20 19:19:34,983 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 19:19:44,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4921290.0, ans=0.0 2024-08-20 19:19:51,263 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=12.0 2024-08-20 19:19:56,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4921390.0, ans=0.125 2024-08-20 19:20:06,005 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.207e+01 2.457e+01 2.685e+01 3.583e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-20 19:20:14,771 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 23 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-20 19:20:16,413 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4921490.0, ans=0.0 2024-08-20 19:20:18,018 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-20 19:20:38,065 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3200, loss[loss=0.08957, beats_loss=0.01142, ecapa_loss=0.0001578, whisper_loss=0.07658, over 21528.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001383, whisper_loss=0.09058, over 3844395.86 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:20:39,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.52 vs. limit=22.5 2024-08-20 19:20:40,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4921690.0, ans=0.0 2024-08-20 19:20:42,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4921690.0, ans=0.0 2024-08-20 19:20:56,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4921790.0, ans=0.125 2024-08-20 19:21:04,609 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4921790.0, ans=0.125 2024-08-20 19:21:19,088 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-20 19:21:25,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4921890.0, ans=0.125 2024-08-20 19:22:00,283 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4922090.0, ans=0.1 2024-08-20 19:22:01,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4922190.0, ans=0.125 2024-08-20 19:22:03,073 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3250, loss[loss=0.1041, beats_loss=0.01023, ecapa_loss=0.0001267, whisper_loss=0.0926, over 21712.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.09067, over 3823620.18 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:22:05,359 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 19:22:06,094 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-20 19:22:12,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4922190.0, ans=0.0 2024-08-20 19:22:15,832 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 19:22:16,026 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4922190.0, ans=0.0 2024-08-20 19:22:56,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.200e+01 2.511e+01 2.776e+01 3.425e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 19:22:56,948 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4922490.0, ans=0.1 2024-08-20 19:23:08,876 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 29 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 19:23:22,462 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:23:28,477 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3300, loss[loss=0.1019, beats_loss=0.0117, ecapa_loss=0.0001386, whisper_loss=0.08877, over 18688.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.0001377, whisper_loss=0.09156, over 3857608.58 frames. ], batch size: 76, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:23:34,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4922690.0, ans=0.1 2024-08-20 19:23:36,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4922690.0, ans=0.07 2024-08-20 19:23:41,084 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 17 from LS+wenet, 20 from Vox, 13 fro AS 2024-08-20 19:23:41,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4922690.0, ans=0.125 2024-08-20 19:23:55,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4922790.0, ans=0.2 2024-08-20 19:23:55,526 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2024-08-20 19:23:58,849 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 29 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-20 19:24:09,971 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 19:24:18,432 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 19:24:23,635 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 19:24:37,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4923090.0, ans=0.0 2024-08-20 19:24:41,642 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-20 19:24:42,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4923090.0, ans=0.125 2024-08-20 19:24:44,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-08-20 19:24:46,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4923090.0, ans=0.125 2024-08-20 19:24:55,509 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3350, loss[loss=0.09485, beats_loss=0.01047, ecapa_loss=0.0001747, whisper_loss=0.08264, over 18572.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.09182, over 3878613.68 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:25:31,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4923390.0, ans=0.0 2024-08-20 19:25:38,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4923390.0, ans=0.125 2024-08-20 19:25:42,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4923390.0, ans=0.125 2024-08-20 19:25:46,078 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:25:49,366 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.220e+01 2.419e+01 2.738e+01 3.918e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-20 19:25:49,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4923490.0, ans=0.2 2024-08-20 19:26:05,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4923590.0, ans=0.0 2024-08-20 19:26:19,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4923590.0, ans=0.125 2024-08-20 19:26:21,889 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3400, loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.09097, over 14279.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001385, whisper_loss=0.0908, over 3832392.05 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:26:30,927 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 19:26:43,745 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-20 19:27:20,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4923990.0, ans=0.1 2024-08-20 19:27:44,094 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 19:27:46,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4924090.0, ans=0.04949747468305833 2024-08-20 19:27:48,629 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3450, loss[loss=0.09759, beats_loss=0.01132, ecapa_loss=0.0001141, whisper_loss=0.08513, over 18046.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.09065, over 3804653.83 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:27:50,990 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 19:28:10,256 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 36 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 19:28:20,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4924290.0, ans=0.1 2024-08-20 19:28:36,309 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-20 19:28:42,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.395e+01 2.734e+01 3.067e+01 2.505e+02, threshold=5.467e+01, percent-clipped=4.0 2024-08-20 19:28:43,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4924490.0, ans=0.1 2024-08-20 19:28:51,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4924490.0, ans=0.0 2024-08-20 19:28:55,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4924490.0, ans=0.1 2024-08-20 19:28:56,506 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 19:29:12,420 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-08-20 19:29:15,162 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3500, loss[loss=0.09069, beats_loss=0.01019, ecapa_loss=0.0001573, whisper_loss=0.07893, over 20251.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001403, whisper_loss=0.09015, over 3782832.28 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:29:29,603 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2024-08-20 19:29:37,881 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 14 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 19:29:41,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4924790.0, ans=0.95 2024-08-20 19:29:45,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4924790.0, ans=0.0 2024-08-20 19:29:57,077 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 19:29:58,825 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 19:30:04,623 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 19:30:10,914 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 19:30:28,624 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 14 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 19:30:42,047 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2024-08-20 19:30:42,349 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3550, loss[loss=0.09764, beats_loss=0.01015, ecapa_loss=0.000143, whisper_loss=0.08606, over 23691.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.08973, over 3770646.93 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:30:44,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.49 vs. limit=22.5 2024-08-20 19:30:57,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-08-20 19:31:09,395 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.893e+01 2024-08-20 19:31:25,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4925390.0, ans=0.125 2024-08-20 19:31:36,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.289e+01 2.465e+01 2.729e+01 3.504e+01, threshold=4.930e+01, percent-clipped=0.0 2024-08-20 19:31:56,132 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4925590.0, ans=0.125 2024-08-20 19:31:58,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.44 vs. limit=10.0 2024-08-20 19:32:04,610 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 32 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 19:32:05,933 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 19:32:09,076 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3600, loss[loss=0.123, beats_loss=0.007801, ecapa_loss=0.0001854, whisper_loss=0.1134, over 21461.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.08957, over 3759387.33 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:32:12,795 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 19:32:20,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4925690.0, ans=0.0 2024-08-20 19:32:33,197 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 17 from LS+wenet, 24 from Vox, 53 fro AS 2024-08-20 19:33:14,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4925990.0, ans=10.0 2024-08-20 19:33:24,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4926090.0, ans=0.125 2024-08-20 19:33:30,252 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 18 from LS+wenet, 32 from Vox, 45 fro AS 2024-08-20 19:33:34,574 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3650, loss[loss=0.1111, beats_loss=0.01009, ecapa_loss=0.0001278, whisper_loss=0.09978, over 21888.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001393, whisper_loss=0.08998, over 3753720.99 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:33:34,796 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 19:33:48,835 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 19:34:03,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4926290.0, ans=0.125 2024-08-20 19:34:22,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-20 19:34:24,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4926390.0, ans=0.125 2024-08-20 19:34:28,497 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.198e+01 2.421e+01 2.738e+01 4.465e+02, threshold=4.843e+01, percent-clipped=1.0 2024-08-20 19:34:30,130 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 19:34:38,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4926490.0, ans=0.2 2024-08-20 19:34:47,963 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 19:34:58,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4926590.0, ans=0.125 2024-08-20 19:35:01,486 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3700, loss[loss=0.1131, beats_loss=0.008733, ecapa_loss=0.0001277, whisper_loss=0.1031, over 21799.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.09014, over 3765990.32 frames. ], batch size: 84, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:35:22,825 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-20 19:35:44,932 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 19:35:56,418 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2024-08-20 19:36:04,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4926990.0, ans=0.125 2024-08-20 19:36:22,213 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 19:36:24,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4927090.0, ans=0.0 2024-08-20 19:36:28,919 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-20 19:36:29,326 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3750, loss[loss=0.08461, beats_loss=0.01278, ecapa_loss=0.000102, whisper_loss=0.07081, over 15715.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001384, whisper_loss=0.08965, over 3731410.21 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:37:00,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-20 19:37:03,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2024-08-20 19:37:13,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4927390.0, ans=0.125 2024-08-20 19:37:22,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.233e+01 2.505e+01 2.774e+01 5.527e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-20 19:37:40,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4927590.0, ans=0.125 2024-08-20 19:37:55,086 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3800, loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001362, whisper_loss=0.0896, over 23493.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001387, whisper_loss=0.09006, over 3749458.14 frames. ], batch size: 96, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:37:55,280 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 19:37:59,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4927690.0, ans=0.125 2024-08-20 19:38:00,442 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 19:38:19,773 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4927790.0, ans=0.125 2024-08-20 19:38:35,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4927890.0, ans=0.125 2024-08-20 19:38:41,333 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-20 19:38:44,237 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 19:39:17,630 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4928090.0, ans=0.0 2024-08-20 19:39:21,972 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3850, loss[loss=0.1061, beats_loss=0.009516, ecapa_loss=0.0001255, whisper_loss=0.09533, over 18420.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001395, whisper_loss=0.09054, over 3760614.70 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:39:22,227 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 19:39:36,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4928190.0, ans=0.125 2024-08-20 19:39:58,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4928390.0, ans=0.125 2024-08-20 19:40:15,629 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4928490.0, ans=0.2 2024-08-20 19:40:16,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.370e+01 2.629e+01 2.963e+01 4.700e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-20 19:40:43,354 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:40:46,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4928590.0, ans=0.0 2024-08-20 19:40:50,861 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3900, loss[loss=0.1054, beats_loss=0.008562, ecapa_loss=0.0001502, whisper_loss=0.09532, over 14415.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001391, whisper_loss=0.09002, over 3764496.90 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:41:05,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4928690.0, ans=10.0 2024-08-20 19:41:06,685 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-20 19:41:19,113 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 19:41:34,106 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 19 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-20 19:41:36,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4928890.0, ans=0.0 2024-08-20 19:42:16,734 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 3950, loss[loss=0.1079, beats_loss=0.008952, ecapa_loss=0.0001607, whisper_loss=0.09733, over 14206.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001395, whisper_loss=0.09014, over 3782293.56 frames. ], batch size: 57, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:42:35,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4929290.0, ans=0.125 2024-08-20 19:42:41,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4929290.0, ans=0.125 2024-08-20 19:42:56,015 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 13 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 19:43:11,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.625e+01 2.908e+01 3.824e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-20 19:43:16,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4929490.0, ans=0.125 2024-08-20 19:43:24,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4929490.0, ans=0.07 2024-08-20 19:43:28,380 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4929590.0, ans=0.0 2024-08-20 19:43:44,290 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4000, loss[loss=0.1087, beats_loss=0.01013, ecapa_loss=0.0001623, whisper_loss=0.09693, over 17032.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001394, whisper_loss=0.09022, over 3796891.04 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:43:49,719 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4929690.0, ans=0.0 2024-08-20 19:43:55,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4929690.0, ans=0.0 2024-08-20 19:44:04,521 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=12.0 2024-08-20 19:44:38,851 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-20 19:44:44,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4929990.0, ans=0.2 2024-08-20 19:44:46,197 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-08-20 19:44:51,027 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 19:44:53,271 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4929990.0, ans=0.1 2024-08-20 19:44:54,599 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 16 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-20 19:44:57,385 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2024-08-20 19:45:05,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4930090.0, ans=0.1 2024-08-20 19:45:07,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4930090.0, ans=0.0 2024-08-20 19:45:14,048 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4050, loss[loss=0.1231, beats_loss=0.0091, ecapa_loss=0.0001311, whisper_loss=0.1127, over 17252.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01026, ecapa_loss=0.0001401, whisper_loss=0.09089, over 3788937.34 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:45:39,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4930290.0, ans=0.05 2024-08-20 19:45:52,697 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 19:46:07,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4930490.0, ans=0.95 2024-08-20 19:46:11,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.272e+01 2.496e+01 2.748e+01 3.675e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-20 19:46:23,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4930490.0, ans=0.2 2024-08-20 19:46:35,015 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4930590.0, ans=0.2 2024-08-20 19:46:44,569 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4100, loss[loss=0.0786, beats_loss=0.01054, ecapa_loss=0.0001574, whisper_loss=0.06648, over 17190.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01029, ecapa_loss=0.0001408, whisper_loss=0.09084, over 3814466.87 frames. ], batch size: 69, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:47:09,929 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2024-08-20 19:47:13,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=12.0 2024-08-20 19:47:28,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4930890.0, ans=0.04949747468305833 2024-08-20 19:48:12,584 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4150, loss[loss=0.1175, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.1052, over 17081.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001396, whisper_loss=0.09018, over 3801165.31 frames. ], batch size: 70, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:48:15,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-20 19:48:55,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4931390.0, ans=0.1 2024-08-20 19:49:00,537 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 19:49:02,169 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 19:49:06,268 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 17 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 19:49:09,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.357e+01 2.563e+01 2.804e+01 4.051e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-20 19:49:14,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4931490.0, ans=0.02 2024-08-20 19:49:18,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-20 19:49:27,152 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4931590.0, ans=0.125 2024-08-20 19:49:28,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-08-20 19:49:29,737 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 19:49:42,267 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4200, loss[loss=0.09015, beats_loss=0.01238, ecapa_loss=0.0001358, whisper_loss=0.07641, over 13579.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001387, whisper_loss=0.09024, over 3789157.84 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:49:57,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-20 19:50:02,144 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 19 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 19:50:08,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4931790.0, ans=0.125 2024-08-20 19:50:10,153 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 19:50:24,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4931890.0, ans=0.0 2024-08-20 19:50:24,436 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4931890.0, ans=0.125 2024-08-20 19:50:28,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4931890.0, ans=0.125 2024-08-20 19:50:31,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4931890.0, ans=0.0 2024-08-20 19:50:33,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4931890.0, ans=0.125 2024-08-20 19:51:11,292 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4250, loss[loss=0.08146, beats_loss=0.01219, ecapa_loss=0.0001372, whisper_loss=0.06789, over 19990.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001386, whisper_loss=0.08977, over 3763969.85 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:51:11,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4932190.0, ans=0.125 2024-08-20 19:51:17,528 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4932190.0, ans=0.0 2024-08-20 19:51:25,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4932190.0, ans=15.0 2024-08-20 19:51:33,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4932290.0, ans=0.1 2024-08-20 19:51:44,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4932290.0, ans=0.2 2024-08-20 19:52:07,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4932490.0, ans=0.125 2024-08-20 19:52:08,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.607e+01 2.265e+01 2.580e+01 2.966e+01 3.429e+02, threshold=5.160e+01, percent-clipped=3.0 2024-08-20 19:52:21,239 WARNING [optim.py:496] (1/4) Scaling gradients by 0.022736379876732826, model_norm_threshold=51.5983772277832 2024-08-20 19:52:21,398 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.129e+05, grad_sumsq=7.129e+05, orig_rms_sq=1.000e+00 2024-08-20 19:52:24,487 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 19:52:26,395 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 19:52:35,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4932590.0, ans=0.2 2024-08-20 19:52:36,307 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 19:52:39,734 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4300, loss[loss=0.09719, beats_loss=0.01179, ecapa_loss=0.0001242, whisper_loss=0.08416, over 20297.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01066, ecapa_loss=0.0001378, whisper_loss=0.08948, over 3750880.67 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:53:22,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4932890.0, ans=0.2 2024-08-20 19:53:36,412 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 19:53:59,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=4933090.0, ans=12.0 2024-08-20 19:54:04,923 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2024-08-20 19:54:08,030 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4350, loss[loss=0.0973, beats_loss=0.01154, ecapa_loss=0.0001536, whisper_loss=0.08422, over 20769.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0107, ecapa_loss=0.0001365, whisper_loss=0.08943, over 3806248.42 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:54:24,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4933290.0, ans=0.125 2024-08-20 19:54:43,849 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4933390.0, ans=0.125 2024-08-20 19:55:04,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.347e+01 2.590e+01 2.980e+01 2.269e+03, threshold=5.180e+01, percent-clipped=1.0 2024-08-20 19:55:05,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4933490.0, ans=0.125 2024-08-20 19:55:26,660 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 29 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 19:55:32,102 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.941e+00 2024-08-20 19:55:35,532 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4400, loss[loss=0.1158, beats_loss=0.00878, ecapa_loss=0.0001318, whisper_loss=0.1057, over 21059.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001368, whisper_loss=0.08963, over 3812129.17 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:56:02,962 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 19:56:10,855 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 19:56:21,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4933890.0, ans=0.125 2024-08-20 19:56:34,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4933990.0, ans=0.04949747468305833 2024-08-20 19:56:38,160 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:57:05,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4934190.0, ans=0.05 2024-08-20 19:57:05,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4934190.0, ans=0.1 2024-08-20 19:57:05,943 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4450, loss[loss=0.09845, beats_loss=0.01023, ecapa_loss=0.000137, whisper_loss=0.08685, over 23735.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001367, whisper_loss=0.08928, over 3799997.55 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:57:28,227 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 19:57:32,657 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-20 19:57:53,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4934390.0, ans=0.125 2024-08-20 19:58:01,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+01 2.339e+01 2.669e+01 2.965e+01 4.502e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-20 19:58:30,646 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4500, loss[loss=0.1105, beats_loss=0.008419, ecapa_loss=0.0001444, whisper_loss=0.1007, over 23156.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001373, whisper_loss=0.08993, over 3808178.25 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:58:45,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4934690.0, ans=0.125 2024-08-20 19:58:55,553 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4934790.0, ans=0.035 2024-08-20 19:59:06,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4934890.0, ans=0.125 2024-08-20 19:59:10,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4934890.0, ans=0.125 2024-08-20 19:59:17,008 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4934890.0, ans=0.125 2024-08-20 19:59:18,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4934890.0, ans=0.1 2024-08-20 19:59:27,457 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2024-08-20 19:59:33,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4934990.0, ans=0.1 2024-08-20 19:59:47,808 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-20 19:59:54,420 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4550, loss[loss=0.1067, beats_loss=0.0104, ecapa_loss=0.0001668, whisper_loss=0.09468, over 21294.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001374, whisper_loss=0.09004, over 3824163.02 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:00:07,040 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 19 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 20:00:08,558 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 20:00:50,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.294e+01 2.455e+01 2.830e+01 3.953e+01, threshold=4.911e+01, percent-clipped=0.0 2024-08-20 20:01:03,428 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-20 20:01:08,423 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 20:01:22,203 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4600, loss[loss=0.1006, beats_loss=0.01032, ecapa_loss=0.0001387, whisper_loss=0.08885, over 20987.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001368, whisper_loss=0.09068, over 3847041.40 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:01:24,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4935690.0, ans=0.0 2024-08-20 20:01:35,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4935690.0, ans=0.2 2024-08-20 20:01:39,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4935790.0, ans=0.2 2024-08-20 20:01:47,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4935790.0, ans=0.125 2024-08-20 20:01:53,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4935790.0, ans=0.2 2024-08-20 20:01:58,464 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-20 20:02:28,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4935990.0, ans=0.1 2024-08-20 20:02:32,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4936090.0, ans=0.125 2024-08-20 20:02:48,899 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4650, loss[loss=0.1071, beats_loss=0.009023, ecapa_loss=0.0001432, whisper_loss=0.09667, over 23126.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001375, whisper_loss=0.09019, over 3838411.24 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:02:52,759 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 20:02:52,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4936190.0, ans=0.125 2024-08-20 20:03:16,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4936290.0, ans=0.125 2024-08-20 20:03:18,029 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 20:03:43,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.237e+01 2.528e+01 2.827e+01 5.668e+01, threshold=5.055e+01, percent-clipped=2.0 2024-08-20 20:03:58,286 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 20:04:15,051 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4700, loss[loss=0.08346, beats_loss=0.01049, ecapa_loss=0.0001252, whisper_loss=0.07172, over 14369.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.0891, over 3779248.06 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:04:18,609 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-20 20:04:42,629 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=15.0 2024-08-20 20:04:54,784 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-20 20:05:00,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-08-20 20:05:33,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4937090.0, ans=0.2 2024-08-20 20:05:36,613 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 20:05:39,982 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4750, loss[loss=0.1086, beats_loss=0.0104, ecapa_loss=0.0001343, whisper_loss=0.09688, over 21564.00 frames. ], tot_loss[loss=0.09996, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.08815, over 3769096.78 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:05:40,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4937190.0, ans=0.0 2024-08-20 20:05:45,897 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 20:05:49,131 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 20:06:03,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4937290.0, ans=0.2 2024-08-20 20:06:37,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.283e+01 2.561e+01 2.830e+01 4.199e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-20 20:06:39,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4937490.0, ans=0.125 2024-08-20 20:06:48,138 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-20 20:06:53,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4937590.0, ans=0.2 2024-08-20 20:07:09,505 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4800, loss[loss=0.08764, beats_loss=0.01335, ecapa_loss=0.0001418, whisper_loss=0.07288, over 19819.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01045, ecapa_loss=0.0001397, whisper_loss=0.0883, over 3780615.41 frames. ], batch size: 86, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:07:10,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4937690.0, ans=0.1 2024-08-20 20:07:47,356 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 20:08:02,678 WARNING [optim.py:496] (1/4) Scaling gradients by 0.025113865733146667, model_norm_threshold=51.21064758300781 2024-08-20 20:08:02,834 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+06, grad_sumsq=1.205e+06, orig_rms_sq=1.000e+00 2024-08-20 20:08:36,424 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 20:08:37,929 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4850, loss[loss=0.1052, beats_loss=0.00908, ecapa_loss=0.000142, whisper_loss=0.09466, over 20404.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01049, ecapa_loss=0.0001394, whisper_loss=0.08818, over 3798385.60 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:08:44,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4938190.0, ans=0.125 2024-08-20 20:08:45,055 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-08-20 20:09:00,535 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 20:09:00,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4938290.0, ans=0.1 2024-08-20 20:09:18,218 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:09:25,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4938390.0, ans=0.0 2024-08-20 20:09:28,186 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4938390.0, ans=0.125 2024-08-20 20:09:30,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4938490.0, ans=0.125 2024-08-20 20:09:34,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.379e+01 2.535e+01 2.834e+01 2.039e+03, threshold=5.069e+01, percent-clipped=1.0 2024-08-20 20:09:37,028 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4938490.0, ans=0.125 2024-08-20 20:09:38,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4938490.0, ans=0.1 2024-08-20 20:09:40,332 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:09:40,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2024-08-20 20:09:53,614 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 22 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-20 20:10:04,028 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 20:10:05,582 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4900, loss[loss=0.1177, beats_loss=0.009347, ecapa_loss=0.0001429, whisper_loss=0.1069, over 21879.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.08822, over 3775587.45 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:10:10,826 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 20:10:25,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4938790.0, ans=0.2 2024-08-20 20:10:42,789 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:10:48,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4938890.0, ans=0.2 2024-08-20 20:10:48,499 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.757e+05 2024-08-20 20:11:02,395 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4938990.0, ans=0.125 2024-08-20 20:11:34,775 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 4950, loss[loss=0.1288, beats_loss=0.01159, ecapa_loss=0.0001059, whisper_loss=0.1162, over 18936.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001394, whisper_loss=0.0894, over 3787236.60 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:12:01,566 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-20 20:12:02,941 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 20:12:18,647 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 20 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-20 20:12:20,510 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 20:12:22,726 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-20 20:12:32,889 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.356e+01 2.576e+01 2.948e+01 1.126e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-20 20:12:33,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4939490.0, ans=0.0 2024-08-20 20:12:37,231 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2024-08-20 20:12:56,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4939590.0, ans=0.125 2024-08-20 20:12:58,352 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:13:05,120 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5000, loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001437, whisper_loss=0.09057, over 21590.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001388, whisper_loss=0.08981, over 3786289.18 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:13:21,222 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 20:13:21,294 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4939790.0, ans=0.125 2024-08-20 20:13:50,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=8.0 2024-08-20 20:13:53,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4939890.0, ans=0.125 2024-08-20 20:13:57,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4939990.0, ans=0.1 2024-08-20 20:14:05,406 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-20 20:14:27,768 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4940090.0, ans=0.0 2024-08-20 20:14:36,052 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5050, loss[loss=0.1154, beats_loss=0.008682, ecapa_loss=0.0001478, whisper_loss=0.1053, over 23277.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001381, whisper_loss=0.0897, over 3763593.38 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:14:43,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4940190.0, ans=0.1 2024-08-20 20:14:43,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4940190.0, ans=0.2 2024-08-20 20:14:46,738 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 21 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-20 20:14:55,609 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 20:15:03,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-20 20:15:09,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4940390.0, ans=0.07 2024-08-20 20:15:10,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-08-20 20:15:26,240 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 20:15:33,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.260e+01 2.507e+01 2.805e+01 5.478e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 20:15:52,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4940590.0, ans=0.0 2024-08-20 20:16:01,579 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 28 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 20:16:04,929 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5100, loss[loss=0.1136, beats_loss=0.009144, ecapa_loss=0.0001291, whisper_loss=0.1032, over 23512.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001382, whisper_loss=0.08987, over 3795200.55 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:16:05,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4940690.0, ans=0.125 2024-08-20 20:16:16,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-20 20:16:28,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4940790.0, ans=0.0 2024-08-20 20:16:33,405 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 20:16:52,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4940890.0, ans=0.1 2024-08-20 20:17:11,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-08-20 20:17:13,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4941090.0, ans=0.1 2024-08-20 20:17:20,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4941090.0, ans=0.1 2024-08-20 20:17:22,613 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-20 20:17:25,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4941090.0, ans=0.0 2024-08-20 20:17:30,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4941190.0, ans=0.125 2024-08-20 20:17:32,259 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5150, loss[loss=0.1068, beats_loss=0.01098, ecapa_loss=0.0001493, whisper_loss=0.09428, over 17802.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001391, whisper_loss=0.09027, over 3819954.92 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:18:05,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4941390.0, ans=0.125 2024-08-20 20:18:09,290 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4941390.0, ans=0.125 2024-08-20 20:18:22,968 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 20:18:27,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.219e+01 2.541e+01 2.868e+01 3.859e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 20:18:31,730 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=4941490.0, ans=0.02 2024-08-20 20:18:39,790 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 20:18:48,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4941590.0, ans=0.0 2024-08-20 20:18:57,901 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5200, loss[loss=0.08405, beats_loss=0.01438, ecapa_loss=9.825e-05, whisper_loss=0.06869, over 19047.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001389, whisper_loss=0.08995, over 3853070.08 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:19:03,923 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 20:19:16,689 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4941790.0, ans=0.0 2024-08-20 20:19:42,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4941890.0, ans=0.0 2024-08-20 20:19:49,343 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 32 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 20:20:14,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4942090.0, ans=0.125 2024-08-20 20:20:18,406 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4942090.0, ans=0.125 2024-08-20 20:20:19,609 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 20:20:26,006 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5250, loss[loss=0.07962, beats_loss=0.01454, ecapa_loss=9.358e-05, whisper_loss=0.06415, over 23623.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.08969, over 3837085.53 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:20:39,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-08-20 20:20:44,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4942290.0, ans=0.95 2024-08-20 20:20:51,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4942290.0, ans=0.1 2024-08-20 20:21:11,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-08-20 20:21:22,411 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.241e+01 2.519e+01 2.751e+01 3.972e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 20:21:35,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4942590.0, ans=0.125 2024-08-20 20:21:40,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4942590.0, ans=0.04949747468305833 2024-08-20 20:21:46,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4942590.0, ans=0.2 2024-08-20 20:21:53,475 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5300, loss[loss=0.1229, beats_loss=0.009282, ecapa_loss=0.0001385, whisper_loss=0.1123, over 22731.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001382, whisper_loss=0.08974, over 3860221.18 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:22:06,524 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 20:22:08,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4942690.0, ans=0.125 2024-08-20 20:22:18,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4942790.0, ans=0.2 2024-08-20 20:22:23,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4942790.0, ans=0.0 2024-08-20 20:22:30,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4942890.0, ans=0.0 2024-08-20 20:22:42,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4942890.0, ans=0.125 2024-08-20 20:22:47,648 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-20 20:22:48,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4942990.0, ans=0.0 2024-08-20 20:23:00,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4942990.0, ans=0.125 2024-08-20 20:23:17,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4943090.0, ans=0.2 2024-08-20 20:23:19,012 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 20:23:22,212 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5350, loss[loss=0.08503, beats_loss=0.01077, ecapa_loss=0.000118, whisper_loss=0.07308, over 18178.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001378, whisper_loss=0.08925, over 3836564.38 frames. ], batch size: 69, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:23:39,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4943290.0, ans=10.0 2024-08-20 20:23:47,538 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 20:24:02,706 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:24:06,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4943390.0, ans=0.125 2024-08-20 20:24:13,616 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:24:18,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4943490.0, ans=0.02 2024-08-20 20:24:19,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.338e+01 2.503e+01 2.804e+01 4.042e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 20:24:28,825 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 20:24:51,252 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5400, loss[loss=0.1085, beats_loss=0.01042, ecapa_loss=0.0001609, whisper_loss=0.09647, over 21559.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001374, whisper_loss=0.08904, over 3798471.91 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:24:55,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4943690.0, ans=0.0 2024-08-20 20:25:05,934 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.75 vs. limit=22.5 2024-08-20 20:25:07,146 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 20:25:12,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4943790.0, ans=0.125 2024-08-20 20:25:23,256 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-20 20:25:43,480 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 20:25:54,730 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2024-08-20 20:25:59,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-20 20:26:17,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4944190.0, ans=0.125 2024-08-20 20:26:17,970 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5450, loss[loss=0.1181, beats_loss=0.008338, ecapa_loss=0.0001605, whisper_loss=0.1082, over 14483.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001373, whisper_loss=0.08947, over 3820553.92 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:26:20,486 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4944190.0, ans=0.0 2024-08-20 20:26:23,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4944190.0, ans=0.1 2024-08-20 20:26:23,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4944190.0, ans=0.125 2024-08-20 20:26:32,653 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=8.0 2024-08-20 20:26:40,308 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2024-08-20 20:26:53,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=22.5 2024-08-20 20:27:00,796 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 20:27:12,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4944490.0, ans=0.125 2024-08-20 20:27:17,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.208e+01 2.419e+01 2.750e+01 4.613e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-20 20:27:23,873 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 20:27:24,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4944490.0, ans=0.2 2024-08-20 20:27:32,418 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 20:27:42,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-20 20:27:48,472 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5500, loss[loss=0.07958, beats_loss=0.01102, ecapa_loss=0.0001242, whisper_loss=0.06732, over 16513.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001374, whisper_loss=0.09029, over 3809195.10 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:27:51,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4944690.0, ans=0.04949747468305833 2024-08-20 20:28:08,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4944790.0, ans=0.2 2024-08-20 20:28:20,302 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 12 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 20:28:41,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4944990.0, ans=0.0 2024-08-20 20:28:43,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4944990.0, ans=0.125 2024-08-20 20:28:48,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4944990.0, ans=0.0 2024-08-20 20:28:52,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4944990.0, ans=0.05 2024-08-20 20:29:14,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4945190.0, ans=0.2 2024-08-20 20:29:16,157 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5550, loss[loss=0.1052, beats_loss=0.008273, ecapa_loss=0.0001859, whisper_loss=0.09506, over 13264.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001375, whisper_loss=0.09075, over 3810622.23 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:29:49,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4945290.0, ans=0.0 2024-08-20 20:29:50,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4945390.0, ans=0.125 2024-08-20 20:29:59,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4945390.0, ans=0.0 2024-08-20 20:30:06,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4945390.0, ans=0.0 2024-08-20 20:30:12,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.276e+01 2.520e+01 2.741e+01 3.796e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-20 20:30:32,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4945590.0, ans=0.125 2024-08-20 20:30:44,035 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5600, loss[loss=0.08831, beats_loss=0.01101, ecapa_loss=0.0001381, whisper_loss=0.07592, over 16506.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01031, ecapa_loss=0.0001381, whisper_loss=0.08994, over 3824693.59 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:30:48,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4945690.0, ans=0.0 2024-08-20 20:30:49,646 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 20 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 20:31:02,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4945790.0, ans=0.125 2024-08-20 20:31:08,918 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 20:31:09,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4945790.0, ans=0.1 2024-08-20 20:31:33,309 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.437e+00 2024-08-20 20:31:40,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4945990.0, ans=0.125 2024-08-20 20:31:40,649 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-20 20:31:41,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4945990.0, ans=0.125 2024-08-20 20:31:59,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-20 20:32:11,393 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 20:32:12,738 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5650, loss[loss=0.1178, beats_loss=0.008762, ecapa_loss=0.0001591, whisper_loss=0.1074, over 22366.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001394, whisper_loss=0.08996, over 3829888.36 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:32:13,453 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4946190.0, ans=0.1 2024-08-20 20:32:16,885 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 20:32:37,288 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4946290.0, ans=0.0 2024-08-20 20:32:47,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4946390.0, ans=0.125 2024-08-20 20:32:51,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4946390.0, ans=0.2 2024-08-20 20:32:58,370 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4946390.0, ans=0.125 2024-08-20 20:33:09,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.323e+01 2.509e+01 2.836e+01 4.746e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-20 20:33:16,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4946490.0, ans=0.0 2024-08-20 20:33:18,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946490.0, ans=0.1 2024-08-20 20:33:25,570 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4946590.0, ans=0.125 2024-08-20 20:33:43,475 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5700, loss[loss=0.0968, beats_loss=0.01149, ecapa_loss=0.0001702, whisper_loss=0.08361, over 20663.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01032, ecapa_loss=0.0001392, whisper_loss=0.09025, over 3853354.63 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:33:55,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4946690.0, ans=0.1 2024-08-20 20:34:22,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4946890.0, ans=0.125 2024-08-20 20:34:49,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4946990.0, ans=0.0 2024-08-20 20:34:51,867 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 20:35:10,972 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 20:35:11,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4947090.0, ans=0.09899494936611666 2024-08-20 20:35:17,988 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-20 20:35:19,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4947190.0, ans=0.125 2024-08-20 20:35:19,979 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5750, loss[loss=0.0897, beats_loss=0.009765, ecapa_loss=0.000161, whisper_loss=0.07832, over 20442.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01028, ecapa_loss=0.0001392, whisper_loss=0.09027, over 3846267.53 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:35:57,712 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 20:36:02,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4947390.0, ans=0.0 2024-08-20 20:36:23,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.253e+01 2.565e+01 2.811e+01 3.552e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 20:36:34,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4947490.0, ans=0.125 2024-08-20 20:36:34,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4947490.0, ans=0.2 2024-08-20 20:36:53,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4947590.0, ans=0.0 2024-08-20 20:36:57,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-20 20:36:57,629 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5800, loss[loss=0.1131, beats_loss=0.006805, ecapa_loss=0.0001719, whisper_loss=0.1045, over 15584.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001392, whisper_loss=0.09043, over 3855030.77 frames. ], batch size: 62, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:37:20,373 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 20:37:26,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4947790.0, ans=0.0 2024-08-20 20:37:58,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4947990.0, ans=0.125 2024-08-20 20:38:02,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4947990.0, ans=0.125 2024-08-20 20:38:03,308 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 20:38:17,120 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 17 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 20:38:27,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4948090.0, ans=0.125 2024-08-20 20:38:30,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-20 20:38:33,083 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 20:38:36,450 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5850, loss[loss=0.116, beats_loss=0.008969, ecapa_loss=0.0001348, whisper_loss=0.1057, over 22296.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01025, ecapa_loss=0.00014, whisper_loss=0.09047, over 3865372.42 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:38:46,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4948190.0, ans=0.125 2024-08-20 20:39:19,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4948390.0, ans=0.07 2024-08-20 20:39:33,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.476e+01 2.710e+01 3.923e+01, threshold=4.952e+01, percent-clipped=0.0 2024-08-20 20:39:44,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4948490.0, ans=0.1 2024-08-20 20:39:46,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4948590.0, ans=0.0 2024-08-20 20:39:46,482 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4948590.0, ans=0.2 2024-08-20 20:39:51,049 WARNING [optim.py:496] (1/4) Scaling gradients by 0.021723005920648575, model_norm_threshold=49.52134323120117 2024-08-20 20:39:51,209 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.306e+06, grad_sumsq=3.974e+05, orig_rms_sq=3.286e+00 2024-08-20 20:40:06,627 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5900, loss[loss=0.1163, beats_loss=0.008902, ecapa_loss=0.0001055, whisper_loss=0.1064, over 17748.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01026, ecapa_loss=0.0001406, whisper_loss=0.0903, over 3845320.25 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:40:28,520 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 20:40:55,686 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 14 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 20:41:24,148 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 20:41:24,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4949090.0, ans=0.1 2024-08-20 20:41:36,087 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 5950, loss[loss=0.1334, beats_loss=0.008133, ecapa_loss=0.0001344, whisper_loss=0.1239, over 23800.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.08956, over 3834210.63 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:41:48,434 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 20:41:52,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4949290.0, ans=0.0 2024-08-20 20:41:59,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4949290.0, ans=0.125 2024-08-20 20:42:04,303 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2024-08-20 20:42:07,681 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4949290.0, ans=0.125 2024-08-20 20:42:11,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4949390.0, ans=0.5 2024-08-20 20:42:19,637 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 24 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-20 20:42:24,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4949390.0, ans=0.125 2024-08-20 20:42:27,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4949390.0, ans=0.2 2024-08-20 20:42:33,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.390e+01 2.635e+01 2.817e+01 2.280e+03, threshold=5.271e+01, percent-clipped=1.0 2024-08-20 20:42:45,651 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 9 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 20:42:48,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-20 20:42:52,427 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4949590.0, ans=0.07 2024-08-20 20:42:52,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4949590.0, ans=0.125 2024-08-20 20:42:59,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4949590.0, ans=0.125 2024-08-20 20:43:06,088 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6000, loss[loss=0.09022, beats_loss=0.01169, ecapa_loss=0.0001308, whisper_loss=0.07723, over 16411.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001398, whisper_loss=0.08929, over 3816712.11 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:43:06,089 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 20:43:58,009 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005083, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 20:44:22,182 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on SV_voxceleb1: loss=0.003999, beats_loss=0, ecapa_loss=0.0003999, whisper_loss=0, over 944235.00 frames. 2024-08-20 20:45:57,818 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 20:45:57,821 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 20:46:43,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4949890.0, ans=0.0 2024-08-20 20:47:00,924 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 20:47:02,126 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4949990.0, ans=0.125 2024-08-20 20:47:04,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4949990.0, ans=0.0 2024-08-20 20:47:06,956 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:47:14,718 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 20:47:17,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4950090.0, ans=0.0 2024-08-20 20:47:36,895 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6050, loss[loss=0.1272, beats_loss=0.008766, ecapa_loss=0.000149, whisper_loss=0.1169, over 22458.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.08866, over 3814683.23 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:48:25,506 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-20 20:48:33,051 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4950390.0, ans=0.125 2024-08-20 20:48:37,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4950390.0, ans=0.0 2024-08-20 20:48:38,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4950390.0, ans=0.0 2024-08-20 20:48:43,652 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4950390.0, ans=0.125 2024-08-20 20:48:53,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.341e+01 2.597e+01 2.876e+01 5.831e+01, threshold=5.193e+01, percent-clipped=1.0 2024-08-20 20:48:56,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4950490.0, ans=0.0 2024-08-20 20:49:04,532 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4950490.0, ans=0.125 2024-08-20 20:49:17,763 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 20:49:27,819 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6100, loss[loss=0.09113, beats_loss=0.01291, ecapa_loss=0.0001361, whisper_loss=0.07686, over 16363.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.08938, over 3854465.66 frames. ], batch size: 69, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:49:40,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4950690.0, ans=0.2 2024-08-20 20:49:58,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4950790.0, ans=0.125 2024-08-20 20:50:01,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=4950790.0, ans=12.0 2024-08-20 20:50:19,352 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-08-20 20:50:21,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4950890.0, ans=0.0 2024-08-20 20:50:34,509 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:50:41,874 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2024-08-20 20:50:45,849 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-20 20:51:17,214 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6150, loss[loss=0.09839, beats_loss=0.01017, ecapa_loss=0.0001516, whisper_loss=0.0867, over 20404.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08944, over 3842377.05 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:51:17,393 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 20:51:31,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-20 20:51:37,252 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 20:51:53,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4951290.0, ans=0.125 2024-08-20 20:51:54,012 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-08-20 20:52:01,711 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 20:52:05,238 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 20:52:07,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4951390.0, ans=0.1 2024-08-20 20:52:27,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.295e+01 2.472e+01 2.689e+01 4.282e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-20 20:52:39,180 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 11 from Vox, 52 fro AS 2024-08-20 20:53:06,733 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6200, loss[loss=0.1172, beats_loss=0.009799, ecapa_loss=0.0001168, whisper_loss=0.1062, over 23724.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001381, whisper_loss=0.09002, over 3857332.55 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:53:07,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4951690.0, ans=0.015 2024-08-20 20:53:10,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4951690.0, ans=0.125 2024-08-20 20:53:20,651 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2024-08-20 20:54:00,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4951890.0, ans=0.125 2024-08-20 20:54:00,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4951890.0, ans=0.125 2024-08-20 20:54:03,962 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 20:54:16,280 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 20:54:38,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4952090.0, ans=0.1 2024-08-20 20:54:39,608 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 20:54:52,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4952090.0, ans=0.1 2024-08-20 20:54:56,583 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6250, loss[loss=0.1199, beats_loss=0.008234, ecapa_loss=0.0001172, whisper_loss=0.1105, over 19685.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001386, whisper_loss=0.09022, over 3870249.90 frames. ], batch size: 70, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:55:02,452 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 21 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-20 20:55:16,808 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 28 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 20:55:35,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4952290.0, ans=0.125 2024-08-20 20:55:36,789 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 20:55:39,007 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 20:55:40,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4952390.0, ans=0.1 2024-08-20 20:55:42,631 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.336e+00 2024-08-20 20:56:06,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.296e+01 2.528e+01 2.851e+01 2.776e+02, threshold=5.056e+01, percent-clipped=4.0 2024-08-20 20:56:07,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4952490.0, ans=0.2 2024-08-20 20:56:18,379 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 20:56:23,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4952590.0, ans=0.0 2024-08-20 20:56:46,966 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-20 20:56:47,423 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6300, loss[loss=0.09126, beats_loss=0.01161, ecapa_loss=0.0001315, whisper_loss=0.07833, over 21031.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001394, whisper_loss=0.09012, over 3842898.82 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:57:14,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4952790.0, ans=0.125 2024-08-20 20:57:24,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4952790.0, ans=0.125 2024-08-20 20:57:26,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4952790.0, ans=0.1 2024-08-20 20:58:20,816 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 20:58:44,207 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6350, loss[loss=0.09384, beats_loss=0.01339, ecapa_loss=0.0001129, whisper_loss=0.07932, over 22845.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.09012, over 3849276.51 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:58:57,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4953190.0, ans=0.125 2024-08-20 20:59:01,122 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 20:59:07,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4953290.0, ans=0.125 2024-08-20 20:59:47,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4953490.0, ans=0.2 2024-08-20 20:59:51,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4953490.0, ans=0.07 2024-08-20 20:59:52,782 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.372e+01 2.595e+01 2.941e+01 1.196e+02, threshold=5.191e+01, percent-clipped=6.0 2024-08-20 21:00:15,761 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 21:00:29,451 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6400, loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001222, whisper_loss=0.08934, over 15706.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001399, whisper_loss=0.09019, over 3826220.26 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:01:00,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4953790.0, ans=0.0 2024-08-20 21:01:43,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4953990.0, ans=0.125 2024-08-20 21:02:02,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4954090.0, ans=0.2 2024-08-20 21:02:03,260 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 21:02:08,844 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6450, loss[loss=0.1056, beats_loss=0.009769, ecapa_loss=0.0001515, whisper_loss=0.09434, over 23127.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.09017, over 3829503.56 frames. ], batch size: 95, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:02:19,647 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-20 21:02:28,612 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 21:02:30,379 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 21:02:33,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-20 21:02:42,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-08-20 21:02:47,781 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 21:02:58,320 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 15 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 21:03:00,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4954390.0, ans=0.125 2024-08-20 21:03:03,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2024-08-20 21:03:06,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4954490.0, ans=0.1 2024-08-20 21:03:11,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.299e+01 2.553e+01 2.896e+01 1.351e+02, threshold=5.106e+01, percent-clipped=1.0 2024-08-20 21:03:14,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4954490.0, ans=0.125 2024-08-20 21:03:20,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4954490.0, ans=0.2 2024-08-20 21:03:27,164 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4954590.0, ans=0.125 2024-08-20 21:03:32,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4954590.0, ans=0.0 2024-08-20 21:03:35,678 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 21:03:39,460 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 21:03:44,644 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6500, loss[loss=0.1027, beats_loss=0.008778, ecapa_loss=0.0001633, whisper_loss=0.09233, over 18085.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.000141, whisper_loss=0.08978, over 3780565.53 frames. ], batch size: 74, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:03:48,114 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-20 21:03:48,778 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 13 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 21:03:57,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-08-20 21:04:08,471 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 21:04:55,114 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4954990.0, ans=0.0 2024-08-20 21:04:57,987 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-20 21:05:18,878 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4955090.0, ans=0.0 2024-08-20 21:05:19,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-20 21:05:27,168 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 21:05:30,062 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 21:05:40,969 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6550, loss[loss=0.0915, beats_loss=0.00951, ecapa_loss=0.0001296, whisper_loss=0.08069, over 13847.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.08998, over 3771312.18 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:05:43,615 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 21:05:54,615 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 21:06:00,471 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4955190.0, ans=0.0 2024-08-20 21:06:13,775 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-20 21:06:24,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4955290.0, ans=0.125 2024-08-20 21:06:57,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.272e+01 2.491e+01 2.852e+01 4.089e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 21:06:58,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4955490.0, ans=0.1 2024-08-20 21:07:05,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4955490.0, ans=0.125 2024-08-20 21:07:05,970 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 17 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 21:07:20,156 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4955590.0, ans=0.0 2024-08-20 21:07:37,682 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6600, loss[loss=0.097, beats_loss=0.01142, ecapa_loss=0.0001004, whisper_loss=0.08458, over 14368.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001409, whisper_loss=0.0896, over 3808262.32 frames. ], batch size: 52, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:07:51,243 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 21:08:22,189 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 21:09:08,011 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 21:09:09,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4955990.0, ans=0.025 2024-08-20 21:09:10,471 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-20 21:09:36,112 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 21:09:47,607 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6650, loss[loss=0.1077, beats_loss=0.0115, ecapa_loss=0.000126, whisper_loss=0.09491, over 19118.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.000142, whisper_loss=0.09036, over 3799020.00 frames. ], batch size: 74, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:09:49,313 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 22 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-20 21:09:52,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4956190.0, ans=0.0 2024-08-20 21:10:01,122 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-08-20 21:10:08,732 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4956290.0, ans=0.125 2024-08-20 21:10:17,798 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 16 from LS+wenet, 32 from Vox, 45 fro AS 2024-08-20 21:10:21,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4956290.0, ans=0.2 2024-08-20 21:10:27,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-20 21:10:39,161 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-20 21:10:44,383 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 21:10:47,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.323e+01 2.536e+01 2.903e+01 4.430e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-20 21:10:49,815 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 21:11:07,218 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 21:11:10,077 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 21:11:18,824 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6700, loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001444, whisper_loss=0.09162, over 15266.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.000142, whisper_loss=0.08994, over 3837095.52 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:11:21,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4956690.0, ans=0.5 2024-08-20 21:11:53,255 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-08-20 21:11:55,888 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 21:12:05,197 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4956890.0, ans=0.125 2024-08-20 21:12:08,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4956890.0, ans=0.125 2024-08-20 21:12:13,506 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 21:12:46,387 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6750, loss[loss=0.09489, beats_loss=0.01391, ecapa_loss=0.0001092, whisper_loss=0.07989, over 22714.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.0897, over 3855853.43 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:12:55,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4957190.0, ans=0.125 2024-08-20 21:12:57,311 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-20 21:13:06,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4957290.0, ans=0.125 2024-08-20 21:13:25,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4957390.0, ans=0.1 2024-08-20 21:13:31,932 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4957390.0, ans=0.0 2024-08-20 21:13:34,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4957390.0, ans=0.125 2024-08-20 21:13:34,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4957390.0, ans=0.125 2024-08-20 21:13:37,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4957490.0, ans=0.1 2024-08-20 21:13:41,086 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-20 21:13:44,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.394e+01 2.668e+01 3.101e+01 4.157e+01, threshold=5.336e+01, percent-clipped=0.0 2024-08-20 21:14:05,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4957590.0, ans=0.125 2024-08-20 21:14:12,990 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6800, loss[loss=0.1137, beats_loss=0.01194, ecapa_loss=0.0001267, whisper_loss=0.1004, over 22955.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09006, over 3852417.62 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:14:18,399 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 21:14:21,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4957690.0, ans=0.125 2024-08-20 21:14:31,791 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 28 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 21:14:33,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4957790.0, ans=0.2 2024-08-20 21:14:33,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4957790.0, ans=0.125 2024-08-20 21:14:51,879 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.36 vs. limit=10.0 2024-08-20 21:14:52,416 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 21:15:14,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-20 21:15:28,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2024-08-20 21:15:39,288 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6850, loss[loss=0.09746, beats_loss=0.00955, ecapa_loss=0.0001372, whisper_loss=0.08653, over 20587.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.000141, whisper_loss=0.08968, over 3808707.31 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:15:53,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4958190.0, ans=0.125 2024-08-20 21:16:03,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4958290.0, ans=0.0 2024-08-20 21:16:26,173 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 25 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-20 21:16:36,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.275e+01 2.461e+01 2.676e+01 7.935e+01, threshold=4.923e+01, percent-clipped=1.0 2024-08-20 21:16:39,269 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4958490.0, ans=0.025 2024-08-20 21:16:57,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4958590.0, ans=0.0 2024-08-20 21:17:06,143 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6900, loss[loss=0.09538, beats_loss=0.01109, ecapa_loss=0.0001371, whisper_loss=0.08292, over 21769.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001412, whisper_loss=0.08977, over 3814058.91 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:17:11,496 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:17:13,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4958690.0, ans=0.05 2024-08-20 21:17:29,711 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-20 21:17:46,335 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4958890.0, ans=0.2 2024-08-20 21:17:48,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4958890.0, ans=0.0 2024-08-20 21:17:48,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4958890.0, ans=0.125 2024-08-20 21:17:56,953 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4958990.0, ans=0.09899494936611666 2024-08-20 21:18:10,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4958990.0, ans=0.07 2024-08-20 21:18:24,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4959090.0, ans=0.0 2024-08-20 21:18:31,986 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 6950, loss[loss=0.1127, beats_loss=0.01048, ecapa_loss=0.0001451, whisper_loss=0.1008, over 22956.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.08901, over 3821961.97 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:18:42,320 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 21:18:42,665 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4959190.0, ans=0.0 2024-08-20 21:18:49,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4959290.0, ans=0.125 2024-08-20 21:18:58,094 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 21:19:15,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4959390.0, ans=0.5 2024-08-20 21:19:25,733 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-08-20 21:19:29,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.291e+01 2.502e+01 2.810e+01 1.652e+02, threshold=5.004e+01, percent-clipped=1.0 2024-08-20 21:19:36,398 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 31 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 21:19:38,007 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 21:19:42,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-20 21:19:58,700 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7000, loss[loss=0.08789, beats_loss=0.01117, ecapa_loss=0.0001734, whisper_loss=0.07498, over 12479.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08932, over 3828078.40 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:20:03,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4959690.0, ans=0.0 2024-08-20 21:20:08,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4959690.0, ans=0.125 2024-08-20 21:20:10,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-20 21:20:32,547 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 21:20:33,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4959890.0, ans=0.2 2024-08-20 21:20:40,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4959890.0, ans=0.125 2024-08-20 21:21:01,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4959990.0, ans=0.125 2024-08-20 21:21:03,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4959990.0, ans=0.0 2024-08-20 21:21:06,462 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 21:21:13,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4960090.0, ans=0.2 2024-08-20 21:21:29,264 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7050, loss[loss=0.09535, beats_loss=0.009947, ecapa_loss=0.0001524, whisper_loss=0.08388, over 13023.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001385, whisper_loss=0.0897, over 3828851.76 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:21:39,661 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 21:21:47,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4960290.0, ans=0.0 2024-08-20 21:22:15,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4960390.0, ans=0.125 2024-08-20 21:22:19,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4960490.0, ans=0.2 2024-08-20 21:22:23,765 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2024-08-20 21:22:25,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.286e+01 2.529e+01 2.848e+01 4.260e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 21:22:28,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4960490.0, ans=0.125 2024-08-20 21:22:31,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4960490.0, ans=0.125 2024-08-20 21:22:41,478 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 21:22:55,535 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7100, loss[loss=0.09123, beats_loss=0.01162, ecapa_loss=0.0001393, whisper_loss=0.07822, over 12506.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001373, whisper_loss=0.09011, over 3848783.68 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:23:00,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4960690.0, ans=0.1 2024-08-20 21:23:01,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4960690.0, ans=0.125 2024-08-20 21:23:17,761 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4960790.0, ans=0.125 2024-08-20 21:23:22,443 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-08-20 21:23:27,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4960790.0, ans=0.0 2024-08-20 21:23:34,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4960890.0, ans=15.0 2024-08-20 21:23:46,034 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 21:23:50,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2024-08-20 21:23:54,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4960990.0, ans=0.125 2024-08-20 21:24:12,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4961090.0, ans=0.2 2024-08-20 21:24:19,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4961090.0, ans=0.125 2024-08-20 21:24:23,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4961190.0, ans=0.125 2024-08-20 21:24:24,145 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7150, loss[loss=0.1127, beats_loss=0.007862, ecapa_loss=0.0001641, whisper_loss=0.1032, over 15628.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001384, whisper_loss=0.0902, over 3868196.77 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:24:25,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4961190.0, ans=0.125 2024-08-20 21:24:28,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4961190.0, ans=0.0 2024-08-20 21:24:42,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4961290.0, ans=0.125 2024-08-20 21:24:42,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4961290.0, ans=0.1 2024-08-20 21:24:49,405 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 21:25:02,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4961390.0, ans=0.0 2024-08-20 21:25:02,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4961390.0, ans=0.2 2024-08-20 21:25:17,653 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 21:25:19,151 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 16 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 21:25:20,933 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 21:25:21,933 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.246e+01 2.471e+01 2.747e+01 3.291e+02, threshold=4.942e+01, percent-clipped=1.0 2024-08-20 21:25:31,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4961490.0, ans=0.125 2024-08-20 21:25:51,113 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2024-08-20 21:25:51,815 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7200, loss[loss=0.09626, beats_loss=0.01176, ecapa_loss=0.0001142, whisper_loss=0.08336, over 22501.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.000139, whisper_loss=0.08972, over 3873070.46 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:26:01,398 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4961690.0, ans=0.125 2024-08-20 21:26:32,295 WARNING [optim.py:496] (1/4) Scaling gradients by 0.05625057592988014, model_norm_threshold=49.41666793823242 2024-08-20 21:26:32,452 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.330e+05, grad_sumsq=1.330e+05, orig_rms_sq=1.000e+00 2024-08-20 21:26:32,731 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 21:26:38,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4961890.0, ans=0.125 2024-08-20 21:26:47,122 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4961990.0, ans=0.2 2024-08-20 21:27:13,630 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 21:27:13,836 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4962090.0, ans=0.5 2024-08-20 21:27:21,184 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7250, loss[loss=0.08565, beats_loss=0.01066, ecapa_loss=0.0001645, whisper_loss=0.07335, over 21260.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.08956, over 3848154.16 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:27:27,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4962190.0, ans=0.0 2024-08-20 21:27:41,054 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.415e-01 2024-08-20 21:27:42,602 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 21:27:44,337 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4962290.0, ans=0.1 2024-08-20 21:27:46,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4962290.0, ans=0.2 2024-08-20 21:27:51,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4962290.0, ans=0.2 2024-08-20 21:28:18,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.295e+01 2.557e+01 2.872e+01 8.785e+02, threshold=5.114e+01, percent-clipped=5.0 2024-08-20 21:28:30,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-08-20 21:28:49,184 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7300, loss[loss=0.09991, beats_loss=0.01144, ecapa_loss=0.0001137, whisper_loss=0.08734, over 19776.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.0893, over 3826650.48 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:28:54,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-20 21:28:55,957 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 21:28:59,081 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 21:29:08,978 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 21:29:21,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4962890.0, ans=0.2 2024-08-20 21:29:30,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4962890.0, ans=0.125 2024-08-20 21:29:36,698 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 21:29:40,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4962990.0, ans=0.0 2024-08-20 21:29:53,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4962990.0, ans=0.2 2024-08-20 21:30:12,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4963090.0, ans=0.125 2024-08-20 21:30:15,011 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7350, loss[loss=0.07853, beats_loss=0.01224, ecapa_loss=0.0001213, whisper_loss=0.06508, over 19728.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001405, whisper_loss=0.08925, over 3829201.15 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:30:23,028 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 21:30:31,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4963290.0, ans=0.1 2024-08-20 21:30:52,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4963390.0, ans=0.1 2024-08-20 21:30:57,810 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4963390.0, ans=0.125 2024-08-20 21:31:11,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.257e+01 2.510e+01 2.739e+01 2.616e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 21:31:13,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4963490.0, ans=0.0 2024-08-20 21:31:13,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4963490.0, ans=0.2 2024-08-20 21:31:29,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4963590.0, ans=0.0 2024-08-20 21:31:40,856 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7400, loss[loss=0.1122, beats_loss=0.00852, ecapa_loss=0.0001249, whisper_loss=0.1025, over 16103.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.09009, over 3847503.14 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:31:46,222 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 21:31:51,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4963690.0, ans=0.125 2024-08-20 21:31:53,957 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-20 21:32:26,594 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 21:32:36,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4963990.0, ans=0.0 2024-08-20 21:33:03,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4964090.0, ans=0.07 2024-08-20 21:33:07,102 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 21 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-20 21:33:07,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-20 21:33:09,744 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7450, loss[loss=0.09434, beats_loss=0.01231, ecapa_loss=8.568e-05, whisper_loss=0.08118, over 14899.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001404, whisper_loss=0.08989, over 3833596.04 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:33:36,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4964290.0, ans=0.125 2024-08-20 21:33:46,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4964390.0, ans=0.05 2024-08-20 21:33:50,460 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2024-08-20 21:34:03,770 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 21:34:08,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.276e+01 2.553e+01 2.837e+01 3.852e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 21:34:12,524 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 21:34:23,185 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 15 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 21:34:39,231 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7500, loss[loss=0.1019, beats_loss=0.008567, ecapa_loss=0.0001405, whisper_loss=0.09189, over 15089.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.0892, over 3819266.67 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:34:39,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4964690.0, ans=0.2 2024-08-20 21:35:05,903 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4964790.0, ans=0.125 2024-08-20 21:35:43,096 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-20 21:35:45,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4964990.0, ans=0.125 2024-08-20 21:35:48,586 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 21:36:03,187 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4965090.0, ans=0.1 2024-08-20 21:36:05,644 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7550, loss[loss=0.1133, beats_loss=0.008739, ecapa_loss=0.0001439, whisper_loss=0.1031, over 22820.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001394, whisper_loss=0.08907, over 3809923.19 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:36:11,064 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 21:36:35,628 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4965290.0, ans=0.125 2024-08-20 21:36:35,891 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-08-20 21:36:41,124 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4965390.0, ans=0.125 2024-08-20 21:36:43,873 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 21:37:02,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.306e+01 2.507e+01 2.711e+01 6.032e+01, threshold=5.014e+01, percent-clipped=1.0 2024-08-20 21:37:11,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4965490.0, ans=0.125 2024-08-20 21:37:17,240 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-20 21:37:31,658 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7600, loss[loss=0.0907, beats_loss=0.0119, ecapa_loss=0.000122, whisper_loss=0.07758, over 23295.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001386, whisper_loss=0.08992, over 3835942.78 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:37:50,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4965790.0, ans=0.125 2024-08-20 21:37:52,640 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4965790.0, ans=0.125 2024-08-20 21:37:56,167 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-20 21:38:16,993 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4965890.0, ans=0.125 2024-08-20 21:38:23,778 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-20 21:38:36,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4965990.0, ans=0.95 2024-08-20 21:38:47,416 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:38:51,189 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4966090.0, ans=0.1 2024-08-20 21:38:56,964 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7650, loss[loss=0.1028, beats_loss=0.008603, ecapa_loss=0.000141, whisper_loss=0.09276, over 20141.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001388, whisper_loss=0.08943, over 3825949.17 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:39:26,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4966290.0, ans=0.0 2024-08-20 21:39:28,297 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 21:39:37,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4966390.0, ans=0.1 2024-08-20 21:39:53,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.307e+01 2.519e+01 2.833e+01 3.884e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 21:40:21,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4966690.0, ans=0.125 2024-08-20 21:40:23,458 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7700, loss[loss=0.09876, beats_loss=0.009428, ecapa_loss=0.0001135, whisper_loss=0.0882, over 17130.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001388, whisper_loss=0.08926, over 3785490.38 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:40:23,698 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 21:41:04,582 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4966890.0, ans=0.2 2024-08-20 21:41:16,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4966990.0, ans=0.125 2024-08-20 21:41:20,841 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-08-20 21:41:25,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4966990.0, ans=0.125 2024-08-20 21:41:25,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4966990.0, ans=0.125 2024-08-20 21:41:39,974 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 32 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 21:41:48,696 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7750, loss[loss=0.09721, beats_loss=0.009666, ecapa_loss=0.0001308, whisper_loss=0.08623, over 15892.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.08954, over 3798267.18 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:41:53,545 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 21:42:07,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-20 21:42:24,444 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 21:42:37,096 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 21:42:39,071 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 21:42:42,437 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 36 from Vox, 28 fro AS 2024-08-20 21:42:44,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4967490.0, ans=0.125 2024-08-20 21:42:47,278 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.264e+01 2.525e+01 2.747e+01 3.905e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-20 21:42:49,693 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 33 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 21:43:00,545 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-20 21:43:16,717 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7800, loss[loss=0.09133, beats_loss=0.01082, ecapa_loss=0.0001344, whisper_loss=0.07916, over 17116.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01029, ecapa_loss=0.0001389, whisper_loss=0.08993, over 3806650.41 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:43:17,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4967690.0, ans=0.125 2024-08-20 21:43:21,064 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 20 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 21:43:29,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4967690.0, ans=0.125 2024-08-20 21:43:45,103 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 21:43:49,050 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.38 vs. limit=22.5 2024-08-20 21:44:01,952 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-20 21:44:02,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4967890.0, ans=0.0 2024-08-20 21:44:13,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4967990.0, ans=0.0 2024-08-20 21:44:21,965 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 21:44:43,122 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7850, loss[loss=0.08448, beats_loss=0.0124, ecapa_loss=0.0001234, whisper_loss=0.07085, over 21833.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01033, ecapa_loss=0.0001384, whisper_loss=0.08946, over 3825413.44 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:44:52,526 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 21:45:05,737 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-20 21:45:05,768 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-20 21:45:41,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.317e+01 2.497e+01 2.913e+01 5.826e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 21:45:47,498 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4968490.0, ans=0.0 2024-08-20 21:46:11,178 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7900, loss[loss=0.09101, beats_loss=0.01166, ecapa_loss=0.0001597, whisper_loss=0.07775, over 16852.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.08988, over 3849667.58 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:46:18,374 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 23 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 21:46:28,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4968790.0, ans=0.125 2024-08-20 21:46:30,302 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 21:46:39,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4968790.0, ans=0.0 2024-08-20 21:46:48,778 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 21:46:56,157 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 21:47:00,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4968890.0, ans=0.0 2024-08-20 21:47:22,916 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 38 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 21:47:32,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4969090.0, ans=0.0 2024-08-20 21:47:36,383 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4969090.0, ans=0.125 2024-08-20 21:47:38,922 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 7950, loss[loss=0.1202, beats_loss=0.007272, ecapa_loss=0.0001421, whisper_loss=0.1115, over 16845.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001382, whisper_loss=0.09015, over 3831390.01 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:47:41,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4969190.0, ans=0.1 2024-08-20 21:47:50,850 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-08-20 21:48:18,590 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 21:48:21,828 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 21:48:24,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.23 vs. limit=15.0 2024-08-20 21:48:29,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4969390.0, ans=0.1 2024-08-20 21:48:36,374 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 21:48:37,321 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.335e+01 2.581e+01 2.814e+01 4.962e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 21:48:48,854 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-20 21:48:55,916 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.683e+05 2024-08-20 21:48:58,955 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4969590.0, ans=0.0 2024-08-20 21:49:07,275 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8000, loss[loss=0.1103, beats_loss=0.008804, ecapa_loss=0.0001669, whisper_loss=0.09982, over 16127.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001393, whisper_loss=0.08991, over 3821567.00 frames. ], batch size: 65, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:49:07,506 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 21:49:17,628 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=22.5 2024-08-20 21:49:50,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4969890.0, ans=0.0 2024-08-20 21:49:55,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-20 21:50:00,727 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0865812599658966, model_norm_threshold=51.61667251586914 2024-08-20 21:50:00,886 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.075e+04, grad_sumsq=5.075e+04, orig_rms_sq=1.000e+00 2024-08-20 21:50:20,659 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-20 21:50:34,966 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8050, loss[loss=0.1014, beats_loss=0.009477, ecapa_loss=0.0001351, whisper_loss=0.09061, over 12566.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001392, whisper_loss=0.08938, over 3834028.36 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:50:45,280 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.182e+01 2024-08-20 21:51:06,404 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:51:06,709 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2024-08-20 21:51:20,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4970390.0, ans=0.0 2024-08-20 21:51:37,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.558e+01 2.951e+01 5.962e+02, threshold=5.117e+01, percent-clipped=2.0 2024-08-20 21:52:03,749 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8100, loss[loss=0.1175, beats_loss=0.01107, ecapa_loss=0.0001404, whisper_loss=0.105, over 23424.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001387, whisper_loss=0.089, over 3826193.34 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:52:38,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4970790.0, ans=0.0 2024-08-20 21:52:53,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4970890.0, ans=0.0 2024-08-20 21:52:54,105 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=12.0 2024-08-20 21:53:02,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4970990.0, ans=0.0 2024-08-20 21:53:26,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4971090.0, ans=0.125 2024-08-20 21:53:32,966 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 29 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-20 21:53:34,293 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8150, loss[loss=0.1211, beats_loss=0.008303, ecapa_loss=0.0001373, whisper_loss=0.1114, over 20270.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001378, whisper_loss=0.08933, over 3833374.21 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:53:46,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4971190.0, ans=0.125 2024-08-20 21:54:07,812 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 21:54:37,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.199e+01 2.482e+01 2.728e+01 1.075e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-20 21:54:41,433 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 21:54:41,830 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-08-20 21:55:04,249 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8200, loss[loss=0.07523, beats_loss=0.01074, ecapa_loss=0.0001433, whisper_loss=0.06306, over 18957.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01055, ecapa_loss=0.0001375, whisper_loss=0.08856, over 3818555.60 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:55:08,094 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 21:55:16,721 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 21:55:33,979 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4971790.0, ans=0.125 2024-08-20 21:56:02,180 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 21:56:03,618 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 21:56:11,794 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-20 21:56:18,143 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4972090.0, ans=0.125 2024-08-20 21:56:24,294 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.31 vs. limit=22.5 2024-08-20 21:56:35,514 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8250, loss[loss=0.1226, beats_loss=0.01026, ecapa_loss=0.0001552, whisper_loss=0.1107, over 23231.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001385, whisper_loss=0.08946, over 3828464.31 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:56:52,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4972290.0, ans=0.0 2024-08-20 21:57:07,635 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 21:57:24,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4972390.0, ans=0.025 2024-08-20 21:57:27,502 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2024-08-20 21:57:29,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4972490.0, ans=0.125 2024-08-20 21:57:29,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4972490.0, ans=0.0 2024-08-20 21:57:37,240 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.205e+01 2.481e+01 2.878e+01 4.142e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-20 21:57:40,212 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.01 vs. limit=10.0 2024-08-20 21:57:41,190 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.236e-01 2024-08-20 21:57:48,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4972590.0, ans=0.125 2024-08-20 21:57:56,426 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 21:58:02,894 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8300, loss[loss=0.08933, beats_loss=0.0123, ecapa_loss=0.0001308, whisper_loss=0.07573, over 21947.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001384, whisper_loss=0.08911, over 3847279.02 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:58:23,196 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 21:58:30,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4972790.0, ans=0.125 2024-08-20 21:58:49,358 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-20 21:58:54,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4972990.0, ans=0.125 2024-08-20 21:58:56,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4972990.0, ans=0.0 2024-08-20 21:59:04,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.83 vs. limit=5.0 2024-08-20 21:59:22,872 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 21:59:31,379 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8350, loss[loss=0.08613, beats_loss=0.01089, ecapa_loss=0.0001654, whisper_loss=0.07359, over 21132.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001392, whisper_loss=0.08926, over 3883360.79 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:59:46,245 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 21:59:49,445 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 21:59:53,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4973290.0, ans=0.1 2024-08-20 22:00:06,971 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=12.0 2024-08-20 22:00:09,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4973390.0, ans=0.0 2024-08-20 22:00:11,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4973390.0, ans=0.2 2024-08-20 22:00:15,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4973390.0, ans=0.0 2024-08-20 22:00:29,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4973490.0, ans=10.0 2024-08-20 22:00:33,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.301e+01 2.484e+01 2.727e+01 5.310e+01, threshold=4.967e+01, percent-clipped=1.0 2024-08-20 22:00:36,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2024-08-20 22:00:42,891 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 22:00:59,472 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8400, loss[loss=0.09578, beats_loss=0.009189, ecapa_loss=0.0002, whisper_loss=0.08459, over 21548.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001395, whisper_loss=0.08928, over 3850268.16 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:01:05,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4973690.0, ans=0.0 2024-08-20 22:01:09,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4973690.0, ans=0.2 2024-08-20 22:01:19,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4973790.0, ans=0.125 2024-08-20 22:01:27,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4973790.0, ans=0.09899494936611666 2024-08-20 22:01:52,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4973990.0, ans=0.125 2024-08-20 22:01:57,023 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 37 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 22:02:06,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.04 vs. limit=15.0 2024-08-20 22:02:16,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4974090.0, ans=0.125 2024-08-20 22:02:28,165 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8450, loss[loss=0.1243, beats_loss=0.009145, ecapa_loss=0.0001407, whisper_loss=0.1137, over 23615.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001405, whisper_loss=0.08891, over 3847463.53 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:02:35,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4974190.0, ans=0.125 2024-08-20 22:03:11,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4974390.0, ans=0.0 2024-08-20 22:03:32,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.304e+01 2.514e+01 2.804e+01 1.040e+02, threshold=5.029e+01, percent-clipped=2.0 2024-08-20 22:03:43,904 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4974590.0, ans=0.0 2024-08-20 22:03:51,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4974590.0, ans=0.0 2024-08-20 22:03:59,162 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8500, loss[loss=0.09479, beats_loss=0.01139, ecapa_loss=0.0001102, whisper_loss=0.0823, over 16756.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01051, ecapa_loss=0.00014, whisper_loss=0.08856, over 3860624.47 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:04:25,588 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.78 vs. limit=10.0 2024-08-20 22:04:42,579 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 22:04:50,239 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 22:05:03,928 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 22:05:07,531 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4974990.0, ans=0.1 2024-08-20 22:05:11,323 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 22:05:31,230 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8550, loss[loss=0.114, beats_loss=0.009994, ecapa_loss=0.0001129, whisper_loss=0.1029, over 14522.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.0887, over 3851373.97 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:05:45,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4975190.0, ans=0.09899494936611666 2024-08-20 22:05:47,333 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 22:05:47,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4975290.0, ans=0.125 2024-08-20 22:05:47,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4975290.0, ans=15.0 2024-08-20 22:05:58,781 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-08-20 22:06:02,314 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 35 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 22:06:07,349 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 22:06:07,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4975390.0, ans=0.125 2024-08-20 22:06:33,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.271e+01 2.473e+01 2.779e+01 6.630e+01, threshold=4.947e+01, percent-clipped=2.0 2024-08-20 22:06:58,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4975690.0, ans=0.125 2024-08-20 22:06:59,606 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8600, loss[loss=0.09008, beats_loss=0.01033, ecapa_loss=0.0001492, whisper_loss=0.07826, over 21058.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.08931, over 3839519.79 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:07:13,845 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4975690.0, ans=0.125 2024-08-20 22:07:18,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-20 22:07:53,024 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4975890.0, ans=0.125 2024-08-20 22:07:55,498 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2024-08-20 22:08:11,483 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 22:08:30,824 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 21 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-20 22:08:34,394 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8650, loss[loss=0.09109, beats_loss=0.01142, ecapa_loss=0.0001337, whisper_loss=0.07833, over 14730.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001397, whisper_loss=0.0899, over 3846892.69 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:08:56,087 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4976290.0, ans=0.0 2024-08-20 22:09:05,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-08-20 22:09:27,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4976490.0, ans=0.0 2024-08-20 22:09:27,548 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2024-08-20 22:09:36,205 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 22:09:37,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.289e+01 2.459e+01 2.667e+01 5.043e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-20 22:09:53,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4976590.0, ans=0.05 2024-08-20 22:10:00,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4976590.0, ans=0.2 2024-08-20 22:10:02,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4976590.0, ans=10.0 2024-08-20 22:10:04,895 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8700, loss[loss=0.1056, beats_loss=0.008488, ecapa_loss=0.0001256, whisper_loss=0.09581, over 14755.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01023, ecapa_loss=0.0001403, whisper_loss=0.09034, over 3843919.04 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:10:08,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4976690.0, ans=0.1 2024-08-20 22:10:13,858 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 22:10:24,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4976790.0, ans=0.04949747468305833 2024-08-20 22:10:30,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4976790.0, ans=0.5 2024-08-20 22:11:18,981 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 23 from LS+wenet, 14 from Vox, 14 fro AS 2024-08-20 22:11:39,276 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4977190.0, ans=0.125 2024-08-20 22:11:40,072 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8750, loss[loss=0.09014, beats_loss=0.009104, ecapa_loss=0.0001525, whisper_loss=0.07952, over 17460.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01015, ecapa_loss=0.00014, whisper_loss=0.09106, over 3819083.37 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:11:49,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4977190.0, ans=0.125 2024-08-20 22:12:14,652 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:12:15,650 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 15 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 22:12:35,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4977490.0, ans=0.0 2024-08-20 22:12:41,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.311e+01 2.545e+01 2.845e+01 5.108e+01, threshold=5.089e+01, percent-clipped=1.0 2024-08-20 22:12:47,649 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4977490.0, ans=0.125 2024-08-20 22:12:47,739 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:12:52,862 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 22:12:59,806 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 22:13:08,476 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8800, loss[loss=0.113, beats_loss=0.008966, ecapa_loss=0.0001239, whisper_loss=0.1028, over 20400.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01017, ecapa_loss=0.0001388, whisper_loss=0.09112, over 3792010.90 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:13:09,061 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4977690.0, ans=0.0 2024-08-20 22:13:17,964 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 22:13:25,317 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 22:14:05,247 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4977990.0, ans=0.0 2024-08-20 22:14:13,043 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 17 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 22:14:15,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4977990.0, ans=0.07 2024-08-20 22:14:25,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4978090.0, ans=0.125 2024-08-20 22:14:29,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4978090.0, ans=0.125 2024-08-20 22:14:31,424 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 22:14:35,111 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 22:14:36,181 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8850, loss[loss=0.09522, beats_loss=0.009042, ecapa_loss=0.0001382, whisper_loss=0.0848, over 19453.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0102, ecapa_loss=0.0001394, whisper_loss=0.09039, over 3762255.58 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:14:47,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4978190.0, ans=0.0 2024-08-20 22:15:03,174 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 15 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 22:15:12,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4978390.0, ans=0.125 2024-08-20 22:15:22,118 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-08-20 22:15:23,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4978390.0, ans=0.125 2024-08-20 22:15:25,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2024-08-20 22:15:28,263 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:15:31,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4978490.0, ans=0.05 2024-08-20 22:15:37,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.277e+01 2.445e+01 2.853e+01 5.587e+01, threshold=4.890e+01, percent-clipped=1.0 2024-08-20 22:15:41,723 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 22:16:01,530 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 22:16:04,079 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8900, loss[loss=0.0886, beats_loss=0.01089, ecapa_loss=0.0001449, whisper_loss=0.07627, over 21711.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.0001385, whisper_loss=0.09018, over 3781076.29 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:16:10,295 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4978690.0, ans=0.0 2024-08-20 22:16:26,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4978790.0, ans=0.125 2024-08-20 22:17:09,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2024-08-20 22:17:29,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4979090.0, ans=0.0 2024-08-20 22:17:32,253 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 8950, loss[loss=0.0963, beats_loss=0.01086, ecapa_loss=0.0001453, whisper_loss=0.08398, over 22807.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001382, whisper_loss=0.09054, over 3802078.14 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:17:36,170 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4979190.0, ans=0.125 2024-08-20 22:17:37,288 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 22:17:58,853 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-08-20 22:18:03,413 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 22:18:32,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.269e+01 2.591e+01 2.866e+01 4.170e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-20 22:18:33,125 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 22:18:42,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4979590.0, ans=0.0 2024-08-20 22:18:42,216 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4979590.0, ans=0.0 2024-08-20 22:18:55,792 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 22:18:59,282 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9000, loss[loss=0.08201, beats_loss=0.009989, ecapa_loss=0.0001581, whisper_loss=0.07044, over 15264.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.09015, over 3803869.52 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:18:59,282 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 22:19:38,138 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005128, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 22:20:04,804 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on SV_voxceleb1: loss=0.003932, beats_loss=0, ecapa_loss=0.0003932, whisper_loss=0, over 944235.00 frames. 2024-08-20 22:20:35,989 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9811, 4.7379, 4.8257, 4.8596], device='cuda:1') 2024-08-20 22:21:44,429 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 22:21:44,432 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 22:21:46,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4979690.0, ans=0.1 2024-08-20 22:22:08,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4979790.0, ans=0.0 2024-08-20 22:22:20,304 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 22:22:24,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4979890.0, ans=0.0 2024-08-20 22:22:32,078 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 22:22:39,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4979990.0, ans=0.0 2024-08-20 22:23:10,099 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 22:23:11,220 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9050, loss[loss=0.109, beats_loss=0.01049, ecapa_loss=0.0001781, whisper_loss=0.09674, over 21579.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001385, whisper_loss=0.09009, over 3801006.16 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:23:17,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4980190.0, ans=0.0 2024-08-20 22:23:33,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4980290.0, ans=0.125 2024-08-20 22:23:55,078 WARNING [optim.py:496] (1/4) Scaling gradients by 0.043528925627470016, model_norm_threshold=51.82819747924805 2024-08-20 22:23:55,238 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.654e+05, grad_sumsq=2.654e+05, orig_rms_sq=1.000e+00 2024-08-20 22:24:01,820 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-20 22:24:04,978 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 22:24:06,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4980490.0, ans=0.125 2024-08-20 22:24:07,007 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-20 22:24:11,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.252e+01 2.512e+01 2.739e+01 1.191e+03, threshold=5.024e+01, percent-clipped=1.0 2024-08-20 22:24:36,501 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9100, loss[loss=0.09768, beats_loss=0.009981, ecapa_loss=0.000113, whisper_loss=0.08657, over 19701.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001381, whisper_loss=0.09035, over 3816061.68 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:24:48,614 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 22:24:56,644 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 22:25:24,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2024-08-20 22:25:31,861 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4980990.0, ans=0.0 2024-08-20 22:25:33,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4980990.0, ans=0.125 2024-08-20 22:25:50,688 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 22:25:53,185 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 22:25:59,826 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9150, loss[loss=0.1115, beats_loss=0.008265, ecapa_loss=0.000106, whisper_loss=0.1022, over 18706.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.000137, whisper_loss=0.0898, over 3796984.44 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:26:03,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4981190.0, ans=0.125 2024-08-20 22:26:06,796 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4981190.0, ans=0.0 2024-08-20 22:26:09,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4981190.0, ans=0.0 2024-08-20 22:26:21,462 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-20 22:26:34,607 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 22:26:36,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4981390.0, ans=0.125 2024-08-20 22:26:40,031 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 22:26:54,023 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4981490.0, ans=0.0 2024-08-20 22:26:58,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.233e+01 2.459e+01 2.675e+01 3.716e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-20 22:27:05,701 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 22:27:07,607 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 22:27:15,041 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.21 vs. limit=22.5 2024-08-20 22:27:20,278 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 22:27:23,682 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9200, loss[loss=0.1184, beats_loss=0.007455, ecapa_loss=0.0001529, whisper_loss=0.1094, over 17596.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001381, whisper_loss=0.08991, over 3782458.25 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:27:42,425 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2024-08-20 22:27:43,317 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 22:27:46,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4981790.0, ans=0.125 2024-08-20 22:28:06,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4981890.0, ans=0.0 2024-08-20 22:28:21,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4981990.0, ans=0.0 2024-08-20 22:28:21,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4981990.0, ans=0.125 2024-08-20 22:28:31,590 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 35 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 22:28:31,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4982090.0, ans=0.0 2024-08-20 22:28:34,854 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 7 from Vox, 27 fro AS 2024-08-20 22:28:42,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4982090.0, ans=0.0 2024-08-20 22:28:48,012 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9250, loss[loss=0.1138, beats_loss=0.0122, ecapa_loss=0.0001437, whisper_loss=0.1002, over 21429.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001375, whisper_loss=0.09045, over 3786674.26 frames. ], batch size: 86, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:28:55,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4982190.0, ans=10.0 2024-08-20 22:29:14,060 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 22:29:15,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4982290.0, ans=0.1 2024-08-20 22:29:30,280 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4982390.0, ans=0.0 2024-08-20 22:29:31,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4982390.0, ans=0.125 2024-08-20 22:29:41,701 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4982490.0, ans=0.125 2024-08-20 22:29:47,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4982490.0, ans=0.0 2024-08-20 22:29:49,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.258e+01 2.506e+01 2.833e+01 3.659e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-20 22:30:00,523 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 22:30:08,963 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 22:30:11,348 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 22:30:15,778 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9300, loss[loss=0.1079, beats_loss=0.008461, ecapa_loss=0.0001788, whisper_loss=0.09764, over 22096.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.09004, over 3813243.63 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:30:32,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4982790.0, ans=0.1 2024-08-20 22:30:58,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4982890.0, ans=0.1 2024-08-20 22:30:58,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4982890.0, ans=0.125 2024-08-20 22:30:59,929 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:31:01,994 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4982890.0, ans=0.125 2024-08-20 22:31:15,821 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4982990.0, ans=0.2 2024-08-20 22:31:22,350 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 22:31:42,221 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9350, loss[loss=0.06385, beats_loss=0.01238, ecapa_loss=0.000171, whisper_loss=0.04976, over 17060.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001378, whisper_loss=0.09046, over 3826026.83 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:32:03,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4983290.0, ans=0.125 2024-08-20 22:32:07,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2024-08-20 22:32:17,494 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4983390.0, ans=0.125 2024-08-20 22:32:19,741 WARNING [optim.py:496] (1/4) Scaling gradients by 0.011810386553406715, model_norm_threshold=50.11380386352539 2024-08-20 22:32:19,900 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.320e+06, grad_sumsq=2.158e+08, orig_rms_sq=1.075e-02 2024-08-20 22:32:34,577 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 23 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-20 22:32:41,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.343e+01 2.545e+01 2.927e+01 4.243e+03, threshold=5.090e+01, percent-clipped=3.0 2024-08-20 22:32:46,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4983490.0, ans=0.2 2024-08-20 22:32:46,073 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4983490.0, ans=0.1 2024-08-20 22:32:49,371 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4983590.0, ans=0.125 2024-08-20 22:33:06,822 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9400, loss[loss=0.107, beats_loss=0.01006, ecapa_loss=0.0001471, whisper_loss=0.09548, over 23461.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001394, whisper_loss=0.09101, over 3810775.36 frames. ], batch size: 96, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:33:14,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=4983690.0, ans=12.0 2024-08-20 22:33:19,000 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4983690.0, ans=0.125 2024-08-20 22:33:19,372 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-08-20 22:33:39,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4983790.0, ans=0.125 2024-08-20 22:33:47,134 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 22:33:50,003 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-20 22:33:53,658 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 20 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-20 22:34:01,283 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 18 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-20 22:34:33,353 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9450, loss[loss=0.1082, beats_loss=0.01028, ecapa_loss=0.0001588, whisper_loss=0.09634, over 19155.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09054, over 3816572.14 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:35:00,570 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 13 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 22:35:02,740 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-20 22:35:22,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4984490.0, ans=0.0 2024-08-20 22:35:25,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-20 22:35:32,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.335e+01 2.553e+01 2.784e+01 4.072e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 22:35:38,612 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 22:35:42,219 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4984590.0, ans=0.125 2024-08-20 22:35:45,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4984590.0, ans=0.125 2024-08-20 22:35:58,731 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9500, loss[loss=0.08715, beats_loss=0.009983, ecapa_loss=0.0001716, whisper_loss=0.07545, over 17570.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.00014, whisper_loss=0.09016, over 3790893.78 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:36:00,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=4984690.0, ans=0.2 2024-08-20 22:36:06,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4984690.0, ans=0.07 2024-08-20 22:36:07,537 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 22:36:13,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4984690.0, ans=0.125 2024-08-20 22:37:02,960 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 14 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 22:37:13,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4985090.0, ans=0.1 2024-08-20 22:37:26,243 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9550, loss[loss=0.1128, beats_loss=0.00779, ecapa_loss=0.0001427, whisper_loss=0.1036, over 16342.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01027, ecapa_loss=0.0001396, whisper_loss=0.0909, over 3816608.77 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:37:30,376 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 22:38:05,493 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 22:38:12,387 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 22:38:15,809 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4985490.0, ans=0.2 2024-08-20 22:38:17,340 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 22:38:18,923 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 22:38:24,975 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.187e+01 2.362e+01 2.605e+01 3.929e+01, threshold=4.725e+01, percent-clipped=0.0 2024-08-20 22:38:28,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2024-08-20 22:38:39,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4985590.0, ans=0.0 2024-08-20 22:38:43,926 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 22:38:45,844 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 22:38:51,964 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9600, loss[loss=0.08992, beats_loss=0.01186, ecapa_loss=0.0001132, whisper_loss=0.07693, over 17481.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001394, whisper_loss=0.09076, over 3813667.48 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:39:05,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4985690.0, ans=0.1 2024-08-20 22:39:11,788 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 22:39:29,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4985890.0, ans=0.05 2024-08-20 22:39:35,264 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 22:39:55,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4985990.0, ans=0.125 2024-08-20 22:39:55,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4985990.0, ans=0.1 2024-08-20 22:40:21,436 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9650, loss[loss=0.1081, beats_loss=0.008911, ecapa_loss=0.0001299, whisper_loss=0.09787, over 18945.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.09007, over 3822180.92 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:40:37,038 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 22:40:46,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4986290.0, ans=0.1 2024-08-20 22:41:03,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4986390.0, ans=0.0 2024-08-20 22:41:12,192 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 22:41:16,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4986490.0, ans=0.2 2024-08-20 22:41:18,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4986490.0, ans=0.125 2024-08-20 22:41:22,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.288e+01 2.415e+01 2.707e+01 3.907e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-20 22:41:23,500 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-20 22:41:29,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4986590.0, ans=0.0 2024-08-20 22:41:40,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4986590.0, ans=0.0 2024-08-20 22:41:48,628 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9700, loss[loss=0.09893, beats_loss=0.01231, ecapa_loss=0.0001356, whisper_loss=0.08526, over 22135.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.08981, over 3824174.02 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:42:11,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4986790.0, ans=0.1 2024-08-20 22:42:37,337 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 22:43:15,061 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9750, loss[loss=0.09816, beats_loss=0.01024, ecapa_loss=0.0001591, whisper_loss=0.08632, over 19844.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001381, whisper_loss=0.08961, over 3809038.41 frames. ], batch size: 86, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:43:20,802 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 22:44:15,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.197e+01 2.422e+01 2.722e+01 3.881e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-20 22:44:15,811 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 24 from LS+wenet, 9 from Vox, 17 fro AS 2024-08-20 22:44:41,301 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9800, loss[loss=0.1067, beats_loss=0.006843, ecapa_loss=0.0001656, whisper_loss=0.09823, over 16963.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001379, whisper_loss=0.09006, over 3789851.13 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:44:41,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4987690.0, ans=0.125 2024-08-20 22:44:41,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4987690.0, ans=10.0 2024-08-20 22:44:48,374 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 22:44:48,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4987690.0, ans=0.0 2024-08-20 22:44:55,432 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 22:45:32,219 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 22:45:44,439 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 18 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 22:45:48,730 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 22:45:55,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4988090.0, ans=0.1 2024-08-20 22:45:58,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-20 22:46:06,811 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9850, loss[loss=0.1032, beats_loss=0.01034, ecapa_loss=0.0001235, whisper_loss=0.0916, over 23171.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001375, whisper_loss=0.09023, over 3797796.64 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:46:07,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4988190.0, ans=0.125 2024-08-20 22:46:16,508 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-20 22:46:37,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4988290.0, ans=0.125 2024-08-20 22:47:07,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.349e+01 2.590e+01 2.982e+01 4.203e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-20 22:47:07,785 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 22:47:13,436 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 17 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 22:47:34,500 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9900, loss[loss=0.08743, beats_loss=0.01143, ecapa_loss=0.0001087, whisper_loss=0.07491, over 18530.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001373, whisper_loss=0.0897, over 3814769.82 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:47:38,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4988690.0, ans=0.1 2024-08-20 22:47:39,631 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 24 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-20 22:47:46,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4988690.0, ans=0.125 2024-08-20 22:47:50,557 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4988790.0, ans=0.125 2024-08-20 22:47:57,611 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.604e+00 2024-08-20 22:47:59,410 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.422e+01 2024-08-20 22:48:09,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4988890.0, ans=0.1 2024-08-20 22:48:20,030 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 25 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-20 22:48:33,979 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 22:48:42,983 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4989090.0, ans=0.125 2024-08-20 22:48:54,936 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4989090.0, ans=0.125 2024-08-20 22:49:01,534 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 9950, loss[loss=0.1121, beats_loss=0.009542, ecapa_loss=0.0001473, whisper_loss=0.1011, over 22791.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001373, whisper_loss=0.08947, over 3812519.23 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:49:02,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4989190.0, ans=0.0 2024-08-20 22:49:07,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4989190.0, ans=0.1 2024-08-20 22:49:49,986 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2024-08-20 22:49:54,437 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 22:49:59,965 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-08-20 22:50:02,138 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.629e+01 2.261e+01 2.497e+01 2.811e+01 6.221e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-20 22:50:24,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4989590.0, ans=0.125 2024-08-20 22:50:28,708 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10000, loss[loss=0.1014, beats_loss=0.009147, ecapa_loss=0.0001657, whisper_loss=0.09057, over 16761.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.09019, over 3820857.49 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:50:31,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4989690.0, ans=0.125 2024-08-20 22:50:55,052 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4989790.0, ans=0.125 2024-08-20 22:51:16,876 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4989890.0, ans=0.2 2024-08-20 22:51:37,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4989990.0, ans=0.2 2024-08-20 22:51:42,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2024-08-20 22:51:46,523 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 22:51:48,026 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 22:51:55,394 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 21 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-20 22:51:58,447 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10050, loss[loss=0.1229, beats_loss=0.008675, ecapa_loss=0.0001542, whisper_loss=0.1127, over 17331.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001383, whisper_loss=0.09066, over 3811915.75 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:51:58,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4990190.0, ans=0.1 2024-08-20 22:52:12,448 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07415100187063217, model_norm_threshold=49.94480514526367 2024-08-20 22:52:12,605 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.662e+05, grad_sumsq=1.662e+05, orig_rms_sq=1.000e+00 2024-08-20 22:52:26,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4990290.0, ans=0.0 2024-08-20 22:52:31,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4990290.0, ans=0.125 2024-08-20 22:52:37,145 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 22:52:40,905 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 34 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-20 22:52:57,812 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 13 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 22:53:00,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.315e+01 2.541e+01 2.876e+01 6.736e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-20 22:53:28,272 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10100, loss[loss=0.1144, beats_loss=0.01146, ecapa_loss=0.0001514, whisper_loss=0.1014, over 15894.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001389, whisper_loss=0.0905, over 3828212.25 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:53:49,632 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 35 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 22:54:14,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.52 vs. limit=22.5 2024-08-20 22:54:34,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-20 22:54:45,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4991090.0, ans=0.95 2024-08-20 22:54:45,854 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-08-20 22:54:49,072 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4991090.0, ans=0.125 2024-08-20 22:54:52,153 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4991090.0, ans=0.0 2024-08-20 22:54:54,994 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10150, loss[loss=0.107, beats_loss=0.01077, ecapa_loss=0.0001282, whisper_loss=0.09493, over 21194.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09014, over 3821568.26 frames. ], batch size: 83, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:55:06,889 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-20 22:55:07,640 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 20 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-20 22:55:08,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-20 22:55:16,297 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 22:55:18,745 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4991290.0, ans=0.125 2024-08-20 22:55:18,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4991290.0, ans=0.0 2024-08-20 22:55:21,469 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 22:55:31,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4991390.0, ans=0.125 2024-08-20 22:55:56,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.245e+01 2.529e+01 2.822e+01 1.463e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 22:56:22,409 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10200, loss[loss=0.09411, beats_loss=0.008957, ecapa_loss=0.0001511, whisper_loss=0.08364, over 19095.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001392, whisper_loss=0.08985, over 3812354.44 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:56:23,469 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2024-08-20 22:56:36,548 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 22:56:37,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4991690.0, ans=0.125 2024-08-20 22:56:49,731 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4991790.0, ans=0.2 2024-08-20 22:57:29,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4991990.0, ans=0.5 2024-08-20 22:57:40,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4992090.0, ans=0.2 2024-08-20 22:57:54,883 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10250, loss[loss=0.09841, beats_loss=0.01032, ecapa_loss=0.0001496, whisper_loss=0.08659, over 19204.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001383, whisper_loss=0.08951, over 3822060.97 frames. ], batch size: 76, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:58:50,570 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 22:58:59,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.301e+01 2.546e+01 2.793e+01 3.961e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 22:59:11,473 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 13 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 22:59:26,741 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10300, loss[loss=0.09455, beats_loss=0.009343, ecapa_loss=0.0001491, whisper_loss=0.08371, over 20064.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001404, whisper_loss=0.08917, over 3826579.78 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:59:29,159 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 22:59:35,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4992690.0, ans=0.125 2024-08-20 23:00:08,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4992890.0, ans=0.0 2024-08-20 23:00:24,140 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4992990.0, ans=0.07 2024-08-20 23:00:34,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4992990.0, ans=0.0 2024-08-20 23:00:50,104 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 23:00:50,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4993090.0, ans=0.0 2024-08-20 23:00:58,095 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10350, loss[loss=0.108, beats_loss=0.008357, ecapa_loss=0.0001533, whisper_loss=0.09814, over 19867.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001399, whisper_loss=0.08993, over 3826947.95 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:01:02,444 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 23:01:18,806 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4993290.0, ans=0.0 2024-08-20 23:01:57,383 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-20 23:02:01,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.402e+01 2.644e+01 3.028e+01 6.335e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-20 23:02:08,677 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 23:02:10,588 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 23:02:12,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4993590.0, ans=0.125 2024-08-20 23:02:14,302 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-20 23:02:18,736 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 20 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 23:02:29,163 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10400, loss[loss=0.1033, beats_loss=0.0102, ecapa_loss=0.0001438, whisper_loss=0.09165, over 22789.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08979, over 3783646.26 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:02:46,948 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 23:02:57,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4993790.0, ans=0.1 2024-08-20 23:03:02,725 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:03:09,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2024-08-20 23:03:11,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4993890.0, ans=0.025 2024-08-20 23:03:30,048 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.27 vs. limit=22.5 2024-08-20 23:03:40,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4994090.0, ans=0.125 2024-08-20 23:03:40,604 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:03:49,988 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 23:03:59,829 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10450, loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001517, whisper_loss=0.08966, over 18713.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.09028, over 3781503.66 frames. ], batch size: 76, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:04:04,656 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-20 23:04:18,405 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4994290.0, ans=0.125 2024-08-20 23:04:21,226 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 23:05:04,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.282e+01 2.461e+01 2.662e+01 8.138e+01, threshold=4.922e+01, percent-clipped=1.0 2024-08-20 23:05:12,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4994590.0, ans=0.125 2024-08-20 23:05:21,483 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-20 23:05:28,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4994590.0, ans=0.0 2024-08-20 23:05:31,033 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10500, loss[loss=0.1073, beats_loss=0.006624, ecapa_loss=0.0001294, whisper_loss=0.09941, over 16468.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01025, ecapa_loss=0.0001401, whisper_loss=0.09054, over 3801756.46 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:05:38,499 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 23:06:05,012 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4994890.0, ans=0.2 2024-08-20 23:06:06,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4994890.0, ans=0.125 2024-08-20 23:06:12,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4994890.0, ans=0.1 2024-08-20 23:06:13,887 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:06:23,911 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 23:06:26,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4994990.0, ans=0.2 2024-08-20 23:06:29,494 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 23:06:32,914 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 23:06:46,432 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-08-20 23:06:49,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-20 23:06:55,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4995090.0, ans=0.125 2024-08-20 23:06:58,686 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10550, loss[loss=0.09116, beats_loss=0.01161, ecapa_loss=0.0001301, whisper_loss=0.07824, over 14054.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.0897, over 3810146.15 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:07:01,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4995190.0, ans=0.125 2024-08-20 23:07:09,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4995190.0, ans=0.1 2024-08-20 23:07:21,118 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 31 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 23:07:40,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4995390.0, ans=0.125 2024-08-20 23:07:46,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4995390.0, ans=0.2 2024-08-20 23:07:49,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4995490.0, ans=0.125 2024-08-20 23:07:55,708 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 23:07:59,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.287e+01 2.503e+01 2.760e+01 4.390e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-20 23:08:04,256 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 23:08:11,844 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-20 23:08:19,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4995590.0, ans=0.125 2024-08-20 23:08:25,597 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10600, loss[loss=0.08806, beats_loss=0.01199, ecapa_loss=0.0001297, whisper_loss=0.07477, over 21468.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001384, whisper_loss=0.08969, over 3838299.52 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:08:40,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4995690.0, ans=0.0 2024-08-20 23:08:56,713 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4995790.0, ans=0.125 2024-08-20 23:09:02,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4995890.0, ans=0.125 2024-08-20 23:09:33,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4995990.0, ans=0.025 2024-08-20 23:09:37,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4995990.0, ans=0.0 2024-08-20 23:09:40,238 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 23:09:59,559 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10650, loss[loss=0.115, beats_loss=0.008109, ecapa_loss=0.0001378, whisper_loss=0.1055, over 22461.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01037, ecapa_loss=0.0001386, whisper_loss=0.08902, over 3823338.45 frames. ], batch size: 86, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:10:01,507 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 23:10:10,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4996190.0, ans=0.2 2024-08-20 23:10:28,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4996290.0, ans=0.0 2024-08-20 23:10:38,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4996390.0, ans=0.125 2024-08-20 23:10:58,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4996490.0, ans=0.0 2024-08-20 23:10:59,467 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 23:11:03,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4996490.0, ans=0.0 2024-08-20 23:11:06,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.258e+01 2.484e+01 2.690e+01 6.351e+01, threshold=4.969e+01, percent-clipped=1.0 2024-08-20 23:11:07,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4996490.0, ans=0.0 2024-08-20 23:11:09,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4996490.0, ans=0.0 2024-08-20 23:11:12,362 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.009e+00 2024-08-20 23:11:12,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4996490.0, ans=0.125 2024-08-20 23:11:33,151 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10700, loss[loss=0.1242, beats_loss=0.00929, ecapa_loss=0.0001947, whisper_loss=0.1129, over 21572.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001382, whisper_loss=0.08878, over 3814363.99 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:12:10,814 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4996890.0, ans=0.0 2024-08-20 23:12:27,694 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 23:12:40,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4996990.0, ans=0.125 2024-08-20 23:12:43,359 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4996990.0, ans=0.0 2024-08-20 23:12:54,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2024-08-20 23:13:03,755 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-20 23:13:06,186 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10750, loss[loss=0.09219, beats_loss=0.011, ecapa_loss=0.0001441, whisper_loss=0.07975, over 20173.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.000138, whisper_loss=0.08896, over 3834462.07 frames. ], batch size: 84, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:13:08,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4997190.0, ans=0.125 2024-08-20 23:13:20,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4997190.0, ans=0.0 2024-08-20 23:13:55,258 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 23:14:07,272 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4997390.0, ans=0.125 2024-08-20 23:14:18,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.299e+01 2.568e+01 2.833e+01 1.794e+02, threshold=5.137e+01, percent-clipped=1.0 2024-08-20 23:14:23,741 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 23:14:34,928 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-20 23:14:40,357 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 23:14:46,993 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10800, loss[loss=0.0742, beats_loss=0.01359, ecapa_loss=0.0001076, whisper_loss=0.05953, over 12567.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01049, ecapa_loss=0.0001375, whisper_loss=0.08905, over 3837504.03 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:14:49,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4997690.0, ans=0.125 2024-08-20 23:14:53,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4997690.0, ans=0.1 2024-08-20 23:15:14,885 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 23:15:32,598 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-20 23:15:34,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4997890.0, ans=0.125 2024-08-20 23:15:42,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4997990.0, ans=0.0 2024-08-20 23:16:05,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4998090.0, ans=0.1 2024-08-20 23:16:13,295 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 23:16:19,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4998190.0, ans=0.125 2024-08-20 23:16:20,525 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10850, loss[loss=0.09529, beats_loss=0.01025, ecapa_loss=0.0001176, whisper_loss=0.08387, over 16893.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.000137, whisper_loss=0.08921, over 3835961.98 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:16:29,939 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 23:16:33,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4998190.0, ans=0.0 2024-08-20 23:16:40,282 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4998290.0, ans=0.125 2024-08-20 23:17:06,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4998390.0, ans=0.1 2024-08-20 23:17:24,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.245e+01 2.613e+01 2.967e+01 9.176e+01, threshold=5.227e+01, percent-clipped=1.0 2024-08-20 23:17:30,979 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2024-08-20 23:17:34,183 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4998590.0, ans=0.2 2024-08-20 23:17:36,254 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-20 23:17:39,233 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 13 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 23:17:46,438 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 23:17:51,199 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10900, loss[loss=0.08819, beats_loss=0.01015, ecapa_loss=0.0001394, whisper_loss=0.07665, over 19695.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0104, ecapa_loss=0.000137, whisper_loss=0.08849, over 3814002.77 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:17:57,251 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4998690.0, ans=0.1 2024-08-20 23:18:02,579 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 23:18:03,018 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-20 23:18:08,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4998790.0, ans=0.2 2024-08-20 23:18:08,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4998790.0, ans=0.125 2024-08-20 23:18:28,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4998890.0, ans=0.1 2024-08-20 23:18:41,878 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 23:18:45,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4998990.0, ans=0.125 2024-08-20 23:18:48,943 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 23:19:08,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4999090.0, ans=0.0 2024-08-20 23:19:21,335 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 10950, loss[loss=0.1181, beats_loss=0.009449, ecapa_loss=0.000153, whisper_loss=0.1071, over 21872.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.0001382, whisper_loss=0.08913, over 3830190.09 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:19:23,313 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 16 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-20 23:19:52,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4999290.0, ans=0.125 2024-08-20 23:19:53,858 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4999290.0, ans=0.125 2024-08-20 23:20:10,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4999390.0, ans=0.125 2024-08-20 23:20:25,224 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.300e+01 2.546e+01 2.869e+01 3.848e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 23:20:31,690 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 23:20:34,170 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2024-08-20 23:20:40,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4999590.0, ans=0.125 2024-08-20 23:20:52,488 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11000, loss[loss=0.1244, beats_loss=0.008873, ecapa_loss=0.000147, whisper_loss=0.1141, over 20776.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001382, whisper_loss=0.08954, over 3820111.39 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:21:04,574 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4999690.0, ans=0.09899494936611666 2024-08-20 23:21:06,815 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-08-20 23:21:37,324 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:21:43,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4999890.0, ans=0.1 2024-08-20 23:21:53,641 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 23:22:16,713 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-20 23:22:22,412 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5000090.0, ans=0.125 2024-08-20 23:22:27,402 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11050, loss[loss=0.1158, beats_loss=0.006201, ecapa_loss=0.0001858, whisper_loss=0.1078, over 13571.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.000139, whisper_loss=0.08964, over 3768550.29 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:22:51,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=5000290.0, ans=6.0 2024-08-20 23:23:07,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5000390.0, ans=0.1 2024-08-20 23:23:17,344 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 23:23:19,265 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 23:23:25,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-08-20 23:23:31,891 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 23:23:32,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5000490.0, ans=0.125 2024-08-20 23:23:32,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.245e+01 2.536e+01 2.821e+01 3.723e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 23:23:44,319 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 23:24:01,821 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11100, loss[loss=0.1252, beats_loss=0.009958, ecapa_loss=0.0001477, whisper_loss=0.1137, over 19718.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.0903, over 3814769.99 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:24:28,483 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 11 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 23:24:56,006 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-20 23:25:03,674 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 23:25:22,901 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5001090.0, ans=0.125 2024-08-20 23:25:25,151 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.67 vs. limit=22.5 2024-08-20 23:25:35,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5001090.0, ans=0.0 2024-08-20 23:25:40,219 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11150, loss[loss=0.1008, beats_loss=0.01027, ecapa_loss=0.0001306, whisper_loss=0.08921, over 19218.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001391, whisper_loss=0.09055, over 3801898.33 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:25:48,360 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5001190.0, ans=0.0 2024-08-20 23:25:59,037 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 27 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-20 23:26:05,921 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-20 23:26:06,741 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 19 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-20 23:26:12,259 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-20 23:26:14,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5001290.0, ans=0.125 2024-08-20 23:26:15,589 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 23:26:33,553 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-20 23:26:40,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5001490.0, ans=0.0 2024-08-20 23:26:44,394 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.389e+01 2.664e+01 3.018e+01 8.039e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-20 23:27:02,497 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 27 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 23:27:14,919 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11200, loss[loss=0.1026, beats_loss=0.01357, ecapa_loss=0.0001206, whisper_loss=0.08778, over 22361.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001379, whisper_loss=0.0911, over 3860054.58 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:27:55,356 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 23:28:01,430 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-20 23:28:08,086 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 23:28:13,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5001990.0, ans=0.1 2024-08-20 23:28:13,611 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=5001990.0, ans=10.0 2024-08-20 23:28:20,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5001990.0, ans=0.0 2024-08-20 23:28:44,643 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5002090.0, ans=0.04949747468305833 2024-08-20 23:28:47,407 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11250, loss[loss=0.1083, beats_loss=0.007838, ecapa_loss=0.0001664, whisper_loss=0.0988, over 15136.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001378, whisper_loss=0.09004, over 3866345.27 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:28:53,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5002190.0, ans=0.125 2024-08-20 23:28:55,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5002190.0, ans=0.0 2024-08-20 23:28:56,968 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 23:29:18,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5002290.0, ans=0.0 2024-08-20 23:29:54,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.254e+01 2.512e+01 2.929e+01 3.894e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-20 23:29:56,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5002490.0, ans=0.0 2024-08-20 23:29:58,399 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 12 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 23:30:22,465 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11300, loss[loss=0.09324, beats_loss=0.01089, ecapa_loss=0.0001776, whisper_loss=0.08058, over 12074.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001375, whisper_loss=0.08988, over 3874030.54 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:30:57,083 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2024-08-20 23:30:59,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-20 23:31:26,549 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5002990.0, ans=0.0 2024-08-20 23:31:50,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5003090.0, ans=0.125 2024-08-20 23:32:09,473 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11350, loss[loss=0.09443, beats_loss=0.009944, ecapa_loss=0.0001027, whisper_loss=0.08346, over 14473.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001378, whisper_loss=0.08941, over 3891461.60 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:32:10,010 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:32:27,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5003290.0, ans=0.2 2024-08-20 23:32:27,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5003290.0, ans=0.125 2024-08-20 23:32:37,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5003290.0, ans=0.1 2024-08-20 23:32:50,150 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 23:32:57,147 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 37 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 23:33:16,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.267e+01 2.559e+01 2.926e+01 1.468e+02, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 23:33:26,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5003590.0, ans=0.1 2024-08-20 23:33:30,523 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5003590.0, ans=0.0 2024-08-20 23:33:43,989 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11400, loss[loss=0.1262, beats_loss=0.0078, ecapa_loss=0.0001341, whisper_loss=0.1171, over 14590.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.000137, whisper_loss=0.09015, over 3861646.35 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:33:54,704 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 23:34:03,070 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.92 vs. limit=10.0 2024-08-20 23:34:18,728 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 22 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-20 23:34:22,487 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5003890.0, ans=0.5 2024-08-20 23:34:29,891 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 23:34:36,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5003890.0, ans=0.0 2024-08-20 23:34:45,879 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 10 from Vox, 46 fro AS 2024-08-20 23:34:56,538 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5004090.0, ans=0.125 2024-08-20 23:35:00,353 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-08-20 23:35:01,799 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5004090.0, ans=0.125 2024-08-20 23:35:03,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5004090.0, ans=0.125 2024-08-20 23:35:08,785 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5004090.0, ans=0.0 2024-08-20 23:35:12,135 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5004090.0, ans=0.125 2024-08-20 23:35:12,300 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-20 23:35:13,454 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-20 23:35:16,169 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11450, loss[loss=0.1083, beats_loss=0.01123, ecapa_loss=0.0001274, whisper_loss=0.09583, over 19751.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001362, whisper_loss=0.08959, over 3854947.14 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:35:24,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5004190.0, ans=0.2 2024-08-20 23:35:26,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5004190.0, ans=0.1 2024-08-20 23:35:26,382 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-20 23:35:36,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5004290.0, ans=0.125 2024-08-20 23:36:31,464 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.263e+01 2.473e+01 2.920e+01 3.744e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-20 23:36:59,824 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11500, loss[loss=0.1083, beats_loss=0.008239, ecapa_loss=0.0001339, whisper_loss=0.09871, over 21942.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001361, whisper_loss=0.08953, over 3839475.02 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:37:04,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5004690.0, ans=0.09899494936611666 2024-08-20 23:37:17,265 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 23:37:22,669 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5004790.0, ans=0.125 2024-08-20 23:37:40,104 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-20 23:37:59,315 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 23:38:01,141 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 23:38:03,048 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5004990.0, ans=0.125 2024-08-20 23:38:04,598 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5004990.0, ans=0.125 2024-08-20 23:38:09,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5004990.0, ans=0.125 2024-08-20 23:38:32,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5005090.0, ans=0.125 2024-08-20 23:38:35,623 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5005090.0, ans=0.05 2024-08-20 23:38:38,278 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11550, loss[loss=0.09366, beats_loss=0.008897, ecapa_loss=0.0001548, whisper_loss=0.08321, over 13597.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001361, whisper_loss=0.08937, over 3791382.95 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:38:51,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5005190.0, ans=0.125 2024-08-20 23:38:58,967 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 23:39:10,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5005290.0, ans=0.125 2024-08-20 23:39:18,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5005390.0, ans=0.1 2024-08-20 23:39:19,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5005390.0, ans=0.125 2024-08-20 23:39:28,181 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-20 23:39:37,795 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2024-08-20 23:39:45,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.283e+01 2.508e+01 2.789e+01 4.307e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 23:39:45,945 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 23:39:51,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5005490.0, ans=0.1 2024-08-20 23:40:08,546 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5005590.0, ans=0.1 2024-08-20 23:40:16,509 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11600, loss[loss=0.1146, beats_loss=0.01102, ecapa_loss=0.0001339, whisper_loss=0.1022, over 22180.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001369, whisper_loss=0.08971, over 3803374.88 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:40:16,705 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 29 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 23:40:17,218 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.33 vs. limit=10.0 2024-08-20 23:40:27,509 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 11 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 23:40:34,125 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-20 23:40:41,292 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-20 23:40:57,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5005890.0, ans=0.125 2024-08-20 23:41:04,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5005890.0, ans=0.0 2024-08-20 23:41:30,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5006090.0, ans=0.0 2024-08-20 23:41:32,536 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2024-08-20 23:41:36,681 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2024-08-20 23:41:38,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5006090.0, ans=0.09899494936611666 2024-08-20 23:41:45,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5006090.0, ans=0.125 2024-08-20 23:41:46,009 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-20 23:41:50,154 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11650, loss[loss=0.08915, beats_loss=0.009229, ecapa_loss=0.0001706, whisper_loss=0.07821, over 16290.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.0001372, whisper_loss=0.0889, over 3792228.94 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:41:50,394 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 23:42:09,768 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.943e-02 2024-08-20 23:42:27,622 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 23:42:53,585 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.472e+01 2.706e+01 2.968e+01 4.797e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-20 23:42:56,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5006490.0, ans=0.125 2024-08-20 23:43:07,917 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5006590.0, ans=0.125 2024-08-20 23:43:10,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=5006590.0, ans=15.0 2024-08-20 23:43:11,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5006590.0, ans=0.125 2024-08-20 23:43:21,202 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11700, loss[loss=0.1137, beats_loss=0.009089, ecapa_loss=0.0001684, whisper_loss=0.1029, over 21661.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001379, whisper_loss=0.0899, over 3830233.45 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:43:22,408 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 23:43:38,854 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5006690.0, ans=0.0 2024-08-20 23:43:54,833 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 23:44:08,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5006890.0, ans=0.125 2024-08-20 23:44:25,333 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 23:44:32,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5006990.0, ans=0.125 2024-08-20 23:44:37,984 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-08-20 23:44:57,128 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11750, loss[loss=0.08731, beats_loss=0.01243, ecapa_loss=0.0001126, whisper_loss=0.07376, over 13747.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001368, whisper_loss=0.08988, over 3831986.44 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:45:10,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5007190.0, ans=0.1 2024-08-20 23:45:25,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5007290.0, ans=0.05 2024-08-20 23:45:35,602 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08465281873941422, model_norm_threshold=54.12553405761719 2024-08-20 23:45:35,763 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.07, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.880e+04, grad_sumsq=2.880e+04, orig_rms_sq=1.000e+00 2024-08-20 23:46:01,517 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.339e+01 2.549e+01 2.970e+01 6.394e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-20 23:46:02,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5007490.0, ans=0.05 2024-08-20 23:46:02,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5007490.0, ans=0.125 2024-08-20 23:46:24,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5007590.0, ans=0.125 2024-08-20 23:46:32,069 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11800, loss[loss=0.09761, beats_loss=0.009758, ecapa_loss=0.0001533, whisper_loss=0.08632, over 17696.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001374, whisper_loss=0.09016, over 3825285.19 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:46:33,113 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5007690.0, ans=0.2 2024-08-20 23:46:34,786 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 33 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 23:46:48,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5007690.0, ans=0.0 2024-08-20 23:47:07,073 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-20 23:47:19,091 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 23:47:21,005 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 23:47:45,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5007990.0, ans=0.125 2024-08-20 23:47:50,370 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 23:47:56,538 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 23:48:04,113 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 27 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 23:48:11,488 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11850, loss[loss=0.08843, beats_loss=0.01114, ecapa_loss=0.000118, whisper_loss=0.07612, over 16119.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001376, whisper_loss=0.09029, over 3823779.53 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:48:11,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5008190.0, ans=0.125 2024-08-20 23:48:37,344 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 23:48:39,342 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=5008290.0, ans=0.05 2024-08-20 23:48:39,369 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5008290.0, ans=0.2 2024-08-20 23:48:40,615 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 23:48:42,913 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 23:48:51,006 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5008390.0, ans=0.05 2024-08-20 23:48:59,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-20 23:49:00,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2024-08-20 23:49:16,234 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 23:49:20,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.312e+01 2.598e+01 2.861e+01 4.202e+01, threshold=5.196e+01, percent-clipped=0.0 2024-08-20 23:49:46,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5008590.0, ans=0.125 2024-08-20 23:49:51,654 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11900, loss[loss=0.1195, beats_loss=0.01009, ecapa_loss=0.0001357, whisper_loss=0.1081, over 20025.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001376, whisper_loss=0.0907, over 3821095.13 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:50:26,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5008790.0, ans=0.0 2024-08-20 23:50:32,319 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-20 23:50:39,684 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 23:51:06,634 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 23:51:13,984 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 23:51:29,139 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 11950, loss[loss=0.08016, beats_loss=0.01091, ecapa_loss=0.0001375, whisper_loss=0.06788, over 17833.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001383, whisper_loss=0.0909, over 3820072.13 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:51:35,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5009190.0, ans=0.0 2024-08-20 23:51:37,352 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 23:51:42,216 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 23:51:59,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=12.0 2024-08-20 23:52:01,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5009290.0, ans=0.04949747468305833 2024-08-20 23:52:05,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5009290.0, ans=0.125 2024-08-20 23:52:08,954 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 26 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-20 23:52:16,098 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 23:52:40,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.309e+01 2.519e+01 2.758e+01 3.846e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 23:52:45,276 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2024-08-20 23:53:01,358 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-08-20 23:53:02,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-20 23:53:07,259 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12000, loss[loss=0.1066, beats_loss=0.008571, ecapa_loss=0.0001277, whisper_loss=0.09674, over 18953.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.0906, over 3805014.33 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:53:07,259 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-20 23:53:44,065 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2522, over 931116.00 frames. 2024-08-20 23:54:09,124 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on SV_voxceleb1: loss=0.003964, beats_loss=0, ecapa_loss=0.0003964, whisper_loss=0, over 944235.00 frames. 2024-08-20 23:55:45,698 INFO [train_multi_KD3.py:1150] (1/4) Epoch 34, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 23:55:45,702 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-20 23:55:54,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5009690.0, ans=0.0 2024-08-20 23:56:04,083 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 23:56:13,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5009790.0, ans=0.125 2024-08-20 23:56:22,428 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 23:56:24,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2024-08-20 23:56:28,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5009890.0, ans=0.0 2024-08-20 23:56:31,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5009890.0, ans=0.0 2024-08-20 23:56:32,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5009890.0, ans=0.125 2024-08-20 23:56:34,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5009890.0, ans=0.0 2024-08-20 23:56:45,322 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5009990.0, ans=0.2 2024-08-20 23:56:46,973 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5009990.0, ans=0.2 2024-08-20 23:56:55,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5010090.0, ans=0.125 2024-08-20 23:57:05,446 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5010090.0, ans=0.125 2024-08-20 23:57:05,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5010090.0, ans=0.0 2024-08-20 23:57:10,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5010190.0, ans=0.0 2024-08-20 23:57:11,219 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12050, loss[loss=0.12, beats_loss=0.008676, ecapa_loss=0.0001325, whisper_loss=0.11, over 22629.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001397, whisper_loss=0.09111, over 3810859.79 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:57:37,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5010290.0, ans=0.015 2024-08-20 23:57:42,551 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 23:57:44,559 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5010290.0, ans=0.0 2024-08-20 23:57:49,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-08-20 23:58:06,295 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 19 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 23:58:14,123 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.275e+01 2.516e+01 2.859e+01 1.031e+02, threshold=5.032e+01, percent-clipped=2.0 2024-08-20 23:58:28,429 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5010590.0, ans=0.0 2024-08-20 23:58:28,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.41 vs. limit=10.0 2024-08-20 23:58:29,576 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 23:58:39,939 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12100, loss[loss=0.08678, beats_loss=0.01104, ecapa_loss=0.0001559, whisper_loss=0.07418, over 14257.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001396, whisper_loss=0.09022, over 3809412.11 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:58:44,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5010690.0, ans=0.0 2024-08-20 23:59:02,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5010790.0, ans=0.125 2024-08-20 23:59:27,839 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 23:59:28,110 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5010890.0, ans=0.125 2024-08-20 23:59:35,752 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 23:59:49,137 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5010990.0, ans=0.1 2024-08-21 00:00:12,309 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12150, loss[loss=0.1014, beats_loss=0.008955, ecapa_loss=0.0001757, whisper_loss=0.09069, over 18752.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.09055, over 3818194.92 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:00:17,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5011190.0, ans=0.0 2024-08-21 00:00:51,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5011390.0, ans=0.0 2024-08-21 00:01:00,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5011390.0, ans=0.2 2024-08-21 00:01:00,715 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5011390.0, ans=0.125 2024-08-21 00:01:08,110 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 00:01:11,700 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5011490.0, ans=0.0 2024-08-21 00:01:14,145 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2024-08-21 00:01:18,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.230e+01 2.539e+01 2.959e+01 2.449e+02, threshold=5.079e+01, percent-clipped=2.0 2024-08-21 00:01:22,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5011490.0, ans=0.0 2024-08-21 00:01:27,927 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 00:01:32,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5011590.0, ans=0.125 2024-08-21 00:01:46,164 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12200, loss[loss=0.09447, beats_loss=0.01269, ecapa_loss=0.0001054, whisper_loss=0.08073, over 22463.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01028, ecapa_loss=0.0001405, whisper_loss=0.09042, over 3801859.63 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:01:59,707 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-21 00:02:09,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5011790.0, ans=0.2 2024-08-21 00:02:09,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5011790.0, ans=0.125 2024-08-21 00:02:35,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5011890.0, ans=0.125 2024-08-21 00:02:36,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-08-21 00:02:56,252 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-21 00:02:58,937 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5011990.0, ans=0.0 2024-08-21 00:03:18,661 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5012090.0, ans=0.125 2024-08-21 00:03:22,456 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-21 00:03:30,852 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 00:03:34,187 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12250, loss[loss=0.08567, beats_loss=0.012, ecapa_loss=0.0001047, whisper_loss=0.07262, over 16529.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001406, whisper_loss=0.09007, over 3793212.51 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:03:34,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-21 00:03:56,255 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 28 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 00:04:09,422 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5012290.0, ans=0.1 2024-08-21 00:04:09,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5012290.0, ans=0.125 2024-08-21 00:04:10,512 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 00:04:12,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5012390.0, ans=0.1 2024-08-21 00:04:12,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5012390.0, ans=0.2 2024-08-21 00:04:27,067 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 00:04:38,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.216e+01 2.475e+01 2.860e+01 1.621e+02, threshold=4.950e+01, percent-clipped=3.0 2024-08-21 00:04:57,810 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2024-08-21 00:05:05,776 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12300, loss[loss=0.09009, beats_loss=0.01036, ecapa_loss=0.0001286, whisper_loss=0.07844, over 23022.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.00014, whisper_loss=0.08957, over 3784844.34 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:05:05,999 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 00:05:14,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5012690.0, ans=0.1 2024-08-21 00:05:27,362 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 00:05:27,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5012790.0, ans=0.125 2024-08-21 00:05:33,818 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 00:05:42,553 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 00:06:28,355 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 00:06:30,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5013090.0, ans=0.125 2024-08-21 00:06:43,073 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12350, loss[loss=0.08157, beats_loss=0.01018, ecapa_loss=0.0001759, whisper_loss=0.06964, over 11815.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.08974, over 3805607.10 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:06:51,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5013190.0, ans=0.125 2024-08-21 00:07:08,070 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0743364468216896, model_norm_threshold=49.50318145751953 2024-08-21 00:07:08,227 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.289e+04, grad_sumsq=4.289e+04, orig_rms_sq=1.000e+00 2024-08-21 00:07:14,521 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5013290.0, ans=0.0 2024-08-21 00:07:19,970 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 00:07:20,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5013390.0, ans=0.0 2024-08-21 00:07:24,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5013390.0, ans=0.035 2024-08-21 00:07:29,725 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-21 00:07:35,625 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5013490.0, ans=0.0 2024-08-21 00:07:45,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.283e+01 2.548e+01 2.937e+01 6.659e+02, threshold=5.096e+01, percent-clipped=4.0 2024-08-21 00:07:53,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-21 00:08:12,653 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12400, loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001522, whisper_loss=0.08952, over 22042.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001382, whisper_loss=0.09009, over 3819789.40 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:08:16,460 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-21 00:08:16,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5013690.0, ans=0.125 2024-08-21 00:08:43,185 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5013790.0, ans=0.125 2024-08-21 00:09:33,971 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-21 00:09:46,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5014190.0, ans=0.125 2024-08-21 00:09:47,372 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12450, loss[loss=0.09928, beats_loss=0.01029, ecapa_loss=0.0001381, whisper_loss=0.08761, over 13803.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001375, whisper_loss=0.08932, over 3804832.12 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:10:07,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5014290.0, ans=0.2 2024-08-21 00:10:11,382 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5014290.0, ans=0.125 2024-08-21 00:10:23,646 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5014390.0, ans=0.0 2024-08-21 00:10:27,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5014390.0, ans=0.0 2024-08-21 00:10:48,758 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 00:10:51,941 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.484e+01 2.743e+01 3.672e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-21 00:11:03,842 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5014590.0, ans=0.05 2024-08-21 00:11:06,987 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-21 00:11:19,802 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12500, loss[loss=0.1027, beats_loss=0.01122, ecapa_loss=0.000124, whisper_loss=0.0902, over 22808.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.000137, whisper_loss=0.08949, over 3806088.78 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:11:41,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5014790.0, ans=0.0 2024-08-21 00:11:45,160 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 00:11:49,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5014790.0, ans=0.0 2024-08-21 00:11:51,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5014790.0, ans=0.125 2024-08-21 00:12:07,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5014890.0, ans=0.035 2024-08-21 00:12:10,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5014890.0, ans=0.2 2024-08-21 00:12:10,512 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=22.5 2024-08-21 00:12:34,058 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 22 from LS+wenet, 7 from Vox, 26 fro AS 2024-08-21 00:12:48,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5015090.0, ans=0.1 2024-08-21 00:12:52,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5015090.0, ans=0.125 2024-08-21 00:12:55,671 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12550, loss[loss=0.1052, beats_loss=0.009419, ecapa_loss=0.0001065, whisper_loss=0.09475, over 14734.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001377, whisper_loss=0.08926, over 3780187.50 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:13:07,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5015190.0, ans=0.0 2024-08-21 00:13:19,713 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 31 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 00:13:31,393 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-21 00:13:41,590 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5015390.0, ans=0.0 2024-08-21 00:13:46,778 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5015390.0, ans=0.125 2024-08-21 00:13:58,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5015490.0, ans=0.125 2024-08-21 00:14:00,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.267e+01 2.470e+01 2.808e+01 4.015e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-21 00:14:11,325 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5015590.0, ans=0.2 2024-08-21 00:14:21,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5015590.0, ans=0.1 2024-08-21 00:14:26,722 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 00:14:26,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5015690.0, ans=0.0 2024-08-21 00:14:28,234 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12600, loss[loss=0.1036, beats_loss=0.0119, ecapa_loss=0.0001267, whisper_loss=0.09046, over 16807.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001384, whisper_loss=0.08987, over 3797857.59 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:14:29,826 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 00:14:34,217 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-21 00:14:40,772 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-21 00:14:42,042 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-08-21 00:14:52,512 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 14 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 00:15:08,739 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 37 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 00:15:13,441 WARNING [optim.py:496] (1/4) Scaling gradients by 0.00705720903351903, model_norm_threshold=49.39711380004883 2024-08-21 00:15:13,599 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.280e+06, grad_sumsq=7.678e+08, orig_rms_sq=1.078e-02 2024-08-21 00:15:26,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5015990.0, ans=0.0 2024-08-21 00:15:34,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5015990.0, ans=0.125 2024-08-21 00:15:36,456 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 00:16:01,764 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12650, loss[loss=0.107, beats_loss=0.01034, ecapa_loss=0.0001385, whisper_loss=0.09524, over 21259.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001379, whisper_loss=0.09029, over 3826444.79 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:16:13,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.42 vs. limit=22.5 2024-08-21 00:16:40,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5016390.0, ans=0.125 2024-08-21 00:16:42,180 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5016390.0, ans=0.2 2024-08-21 00:16:58,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5016490.0, ans=0.125 2024-08-21 00:17:02,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5016490.0, ans=0.2 2024-08-21 00:17:03,666 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-21 00:17:07,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.324e+01 2.529e+01 2.803e+01 7.000e+03, threshold=5.059e+01, percent-clipped=4.0 2024-08-21 00:17:31,208 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 27 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 00:17:31,701 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:17:38,229 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12700, loss[loss=0.1155, beats_loss=0.009755, ecapa_loss=0.0001255, whisper_loss=0.1045, over 19077.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001379, whisper_loss=0.09009, over 3800209.90 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:17:41,494 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-21 00:17:43,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5016690.0, ans=0.2 2024-08-21 00:17:56,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5016790.0, ans=0.2 2024-08-21 00:18:06,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5016790.0, ans=0.125 2024-08-21 00:18:09,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5016790.0, ans=0.0 2024-08-21 00:18:22,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5016890.0, ans=0.2 2024-08-21 00:18:25,962 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.078e-01 2024-08-21 00:18:27,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5016890.0, ans=0.0 2024-08-21 00:18:30,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5016890.0, ans=0.07 2024-08-21 00:18:34,552 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.01 vs. limit=22.5 2024-08-21 00:18:51,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5017090.0, ans=0.0 2024-08-21 00:19:11,152 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12750, loss[loss=0.1026, beats_loss=0.01005, ecapa_loss=0.0001573, whisper_loss=0.09097, over 18049.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01032, ecapa_loss=0.0001378, whisper_loss=0.09035, over 3814958.52 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:19:11,721 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5017190.0, ans=0.2 2024-08-21 00:19:29,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5017290.0, ans=0.1 2024-08-21 00:19:32,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5017290.0, ans=0.0 2024-08-21 00:19:33,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5017290.0, ans=0.05 2024-08-21 00:19:37,275 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 00:20:19,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.263e+01 2.506e+01 2.738e+01 4.032e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-21 00:20:31,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=5017590.0, ans=0.025 2024-08-21 00:20:42,770 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 00:20:46,988 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12800, loss[loss=0.08537, beats_loss=0.012, ecapa_loss=0.0001411, whisper_loss=0.07196, over 20178.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001376, whisper_loss=0.0903, over 3859060.16 frames. ], batch size: 85, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:20:58,582 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-21 00:20:58,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5017690.0, ans=0.125 2024-08-21 00:21:21,011 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-21 00:21:27,960 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 14 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-21 00:21:38,363 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 28 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-21 00:21:53,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5017990.0, ans=0.0 2024-08-21 00:22:06,937 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 00:22:29,458 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12850, loss[loss=0.08676, beats_loss=0.01144, ecapa_loss=0.0001518, whisper_loss=0.0738, over 18549.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001377, whisper_loss=0.09025, over 3823270.17 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:23:02,016 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-21 00:23:24,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5018390.0, ans=0.125 2024-08-21 00:23:28,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5018490.0, ans=0.1 2024-08-21 00:23:38,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.519e+01 2.755e+01 3.962e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 00:23:58,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5018590.0, ans=0.2 2024-08-21 00:24:12,760 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12900, loss[loss=0.09618, beats_loss=0.01143, ecapa_loss=0.0001334, whisper_loss=0.08342, over 21885.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001382, whisper_loss=0.08992, over 3816672.38 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:25:10,301 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-21 00:25:11,182 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 00:25:17,428 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 00:25:23,110 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 00:25:39,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5019090.0, ans=0.0 2024-08-21 00:25:44,438 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 26 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 00:25:47,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-21 00:25:48,067 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 12950, loss[loss=0.08759, beats_loss=0.01313, ecapa_loss=0.0001244, whisper_loss=0.07322, over 20179.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001383, whisper_loss=0.09036, over 3823522.99 frames. ], batch size: 83, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:25:48,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5019190.0, ans=0.2 2024-08-21 00:25:52,534 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 00:26:09,175 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 28 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-21 00:26:20,705 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 00:26:26,480 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:26:38,784 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5019390.0, ans=0.1 2024-08-21 00:26:39,764 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-21 00:26:59,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.232e+01 2.418e+01 2.707e+01 3.600e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-21 00:27:33,073 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13000, loss[loss=0.08638, beats_loss=0.008216, ecapa_loss=0.0001775, whisper_loss=0.07639, over 13031.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.09054, over 3810912.80 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:27:40,081 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-21 00:27:47,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5019690.0, ans=0.125 2024-08-21 00:28:01,225 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 00:28:29,924 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5019890.0, ans=0.125 2024-08-21 00:28:45,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5019990.0, ans=0.05 2024-08-21 00:28:53,104 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-08-21 00:29:10,463 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13050, loss[loss=0.1094, beats_loss=0.008027, ecapa_loss=0.0001327, whisper_loss=0.1001, over 15259.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.09027, over 3796015.59 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:29:20,036 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5020190.0, ans=0.0 2024-08-21 00:29:23,720 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:29:43,653 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:29:56,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5020390.0, ans=0.1 2024-08-21 00:30:07,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5020490.0, ans=0.1 2024-08-21 00:30:15,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.250e+01 2.442e+01 2.810e+01 6.229e+01, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 00:30:34,708 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 00:30:44,116 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13100, loss[loss=0.09814, beats_loss=0.00914, ecapa_loss=0.0001447, whisper_loss=0.08756, over 16719.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.08957, over 3769610.93 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:31:22,695 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 29 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 00:31:22,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5020890.0, ans=0.125 2024-08-21 00:31:30,469 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 00:31:44,843 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-21 00:31:55,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5020990.0, ans=0.1 2024-08-21 00:32:07,374 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 00:32:13,908 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09580767154693604, model_norm_threshold=48.835636138916016 2024-08-21 00:32:14,067 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.692e+04, grad_sumsq=4.692e+04, orig_rms_sq=1.000e+00 2024-08-21 00:32:16,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5021090.0, ans=0.0 2024-08-21 00:32:17,956 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 00:32:19,229 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13150, loss[loss=0.09338, beats_loss=0.01122, ecapa_loss=0.0001044, whisper_loss=0.08111, over 14207.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.08918, over 3711938.60 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:32:27,022 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 30 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-21 00:33:22,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.398e+01 2.549e+01 2.951e+01 5.097e+02, threshold=5.098e+01, percent-clipped=2.0 2024-08-21 00:33:22,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5021490.0, ans=0.0 2024-08-21 00:33:26,859 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5021490.0, ans=0.04949747468305833 2024-08-21 00:33:27,947 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 20 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-21 00:33:30,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2024-08-21 00:33:32,047 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-21 00:33:38,683 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5021590.0, ans=0.125 2024-08-21 00:33:52,579 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13200, loss[loss=0.1017, beats_loss=0.01309, ecapa_loss=0.0001186, whisper_loss=0.08745, over 23093.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.08938, over 3704075.48 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:34:17,998 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 00:34:26,067 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-21 00:34:40,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5021890.0, ans=0.125 2024-08-21 00:34:48,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5021890.0, ans=0.1 2024-08-21 00:34:52,804 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 17 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-21 00:35:00,316 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-21 00:35:08,724 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-21 00:35:14,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5022090.0, ans=0.125 2024-08-21 00:35:34,092 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13250, loss[loss=0.1175, beats_loss=0.008216, ecapa_loss=0.000143, whisper_loss=0.1078, over 20268.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.0894, over 3733289.57 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:35:37,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-21 00:35:41,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5022190.0, ans=0.1 2024-08-21 00:35:46,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5022190.0, ans=0.1 2024-08-21 00:35:47,213 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=22.5 2024-08-21 00:35:50,677 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5022290.0, ans=0.1 2024-08-21 00:35:54,188 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 00:36:05,548 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=5022290.0, ans=0.025 2024-08-21 00:36:26,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5022490.0, ans=0.125 2024-08-21 00:36:26,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5022490.0, ans=0.0 2024-08-21 00:36:37,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.371e+01 2.560e+01 2.906e+01 3.702e+02, threshold=5.119e+01, percent-clipped=3.0 2024-08-21 00:36:56,360 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-21 00:37:08,375 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13300, loss[loss=0.1025, beats_loss=0.01091, ecapa_loss=0.0001112, whisper_loss=0.09046, over 18313.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08932, over 3767963.49 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:37:14,334 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 00:37:27,181 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5022790.0, ans=0.125 2024-08-21 00:37:35,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5022790.0, ans=0.125 2024-08-21 00:37:49,122 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 31 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 00:37:49,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5022890.0, ans=0.04949747468305833 2024-08-21 00:38:00,149 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-21 00:38:02,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-08-21 00:38:25,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-08-21 00:38:43,911 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13350, loss[loss=0.1027, beats_loss=0.009741, ecapa_loss=0.0001692, whisper_loss=0.09127, over 19097.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01032, ecapa_loss=0.0001404, whisper_loss=0.0891, over 3740024.62 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:38:46,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5023190.0, ans=0.0 2024-08-21 00:39:02,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5023290.0, ans=0.125 2024-08-21 00:39:20,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5023390.0, ans=0.07 2024-08-21 00:39:49,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.315e+01 2.502e+01 2.886e+01 3.923e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 00:39:52,273 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5023490.0, ans=0.125 2024-08-21 00:39:54,841 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-21 00:40:01,846 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2024-08-21 00:40:18,737 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13400, loss[loss=0.09672, beats_loss=0.01171, ecapa_loss=0.0001398, whisper_loss=0.08361, over 17548.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001392, whisper_loss=0.08994, over 3761252.35 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:40:35,496 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5023790.0, ans=0.125 2024-08-21 00:40:47,274 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5023790.0, ans=0.125 2024-08-21 00:40:54,386 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 00:41:03,076 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5023890.0, ans=0.0 2024-08-21 00:41:04,592 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 8 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 00:41:15,142 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 00:41:20,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5023990.0, ans=0.125 2024-08-21 00:41:26,607 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 00:41:28,532 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 00:41:39,016 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 00:41:46,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5024190.0, ans=0.1 2024-08-21 00:41:47,587 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13450, loss[loss=0.09467, beats_loss=0.01108, ecapa_loss=0.0001206, whisper_loss=0.08239, over 18305.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001384, whisper_loss=0.08945, over 3771738.74 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:41:51,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-21 00:42:01,987 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 00:42:11,558 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5024290.0, ans=0.125 2024-08-21 00:42:16,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-08-21 00:42:18,370 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:42:34,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5024390.0, ans=0.0 2024-08-21 00:42:37,249 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-21 00:42:51,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.233e+01 2.383e+01 2.688e+01 3.683e+01, threshold=4.765e+01, percent-clipped=0.0 2024-08-21 00:43:10,959 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 00:43:22,049 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5024690.0, ans=0.125 2024-08-21 00:43:22,787 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13500, loss[loss=0.1035, beats_loss=0.01008, ecapa_loss=0.0001238, whisper_loss=0.0922, over 23513.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.08978, over 3774478.81 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:43:44,980 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5024790.0, ans=0.125 2024-08-21 00:43:46,643 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 00:43:49,194 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2024-08-21 00:43:56,295 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 00:43:59,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5024890.0, ans=0.125 2024-08-21 00:44:09,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5024890.0, ans=0.1 2024-08-21 00:44:23,719 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-21 00:44:30,853 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5024990.0, ans=0.0 2024-08-21 00:44:59,202 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13550, loss[loss=0.1067, beats_loss=0.009081, ecapa_loss=0.0001599, whisper_loss=0.09604, over 22988.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01031, ecapa_loss=0.000138, whisper_loss=0.08992, over 3762486.32 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:45:13,472 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 30 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 00:45:29,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5025290.0, ans=0.0 2024-08-21 00:45:34,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5025290.0, ans=0.0 2024-08-21 00:45:35,703 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 00:45:41,295 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-21 00:46:07,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.237e+01 2.540e+01 2.882e+01 4.860e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 00:46:23,610 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-08-21 00:46:26,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5025590.0, ans=0.0 2024-08-21 00:46:32,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5025590.0, ans=0.125 2024-08-21 00:46:34,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5025690.0, ans=0.0 2024-08-21 00:46:34,834 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13600, loss[loss=0.1028, beats_loss=0.009028, ecapa_loss=0.0001304, whisper_loss=0.09249, over 14885.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01024, ecapa_loss=0.000139, whisper_loss=0.09022, over 3779647.09 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:46:43,509 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=12.0 2024-08-21 00:47:08,922 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-21 00:47:17,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5025890.0, ans=0.125 2024-08-21 00:47:23,877 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5025890.0, ans=0.125 2024-08-21 00:47:49,856 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 00:47:57,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5026090.0, ans=0.0 2024-08-21 00:48:09,541 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13650, loss[loss=0.1196, beats_loss=0.008245, ecapa_loss=0.0001231, whisper_loss=0.1102, over 22656.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01029, ecapa_loss=0.0001387, whisper_loss=0.08974, over 3776109.39 frames. ], batch size: 86, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:48:48,921 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-21 00:48:57,847 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5026390.0, ans=0.125 2024-08-21 00:49:03,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5026390.0, ans=0.125 2024-08-21 00:49:07,859 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 00:49:22,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.304e+01 2.539e+01 2.805e+01 5.664e+01, threshold=5.078e+01, percent-clipped=1.0 2024-08-21 00:49:29,997 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.06 vs. limit=5.0 2024-08-21 00:49:45,536 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 00:49:52,285 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13700, loss[loss=0.1083, beats_loss=0.01057, ecapa_loss=0.000128, whisper_loss=0.09645, over 17019.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01029, ecapa_loss=0.0001393, whisper_loss=0.08957, over 3773612.07 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:50:28,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5026790.0, ans=0.125 2024-08-21 00:50:33,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5026890.0, ans=0.125 2024-08-21 00:50:44,941 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 00:51:04,123 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5026990.0, ans=0.2 2024-08-21 00:51:18,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5027090.0, ans=0.125 2024-08-21 00:51:27,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5027090.0, ans=0.125 2024-08-21 00:51:32,320 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13750, loss[loss=0.0774, beats_loss=0.01445, ecapa_loss=0.0001089, whisper_loss=0.06186, over 18900.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001387, whisper_loss=0.08991, over 3796638.30 frames. ], batch size: 76, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:51:57,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=5027290.0, ans=0.1 2024-08-21 00:51:58,334 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 00:52:02,532 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 00:52:38,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5027490.0, ans=0.2 2024-08-21 00:52:45,990 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.353e+01 2.699e+01 3.002e+01 5.030e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-21 00:52:54,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5027490.0, ans=0.125 2024-08-21 00:52:57,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5027590.0, ans=15.0 2024-08-21 00:53:16,609 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13800, loss[loss=0.1135, beats_loss=0.009062, ecapa_loss=0.0001791, whisper_loss=0.1027, over 21090.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.09046, over 3845573.25 frames. ], batch size: 93, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:53:31,346 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.53 vs. limit=22.5 2024-08-21 00:53:36,755 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5027790.0, ans=0.125 2024-08-21 00:53:44,409 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5027790.0, ans=0.2 2024-08-21 00:53:44,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5027790.0, ans=0.125 2024-08-21 00:53:51,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5027790.0, ans=0.125 2024-08-21 00:53:53,441 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-21 00:54:14,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5027990.0, ans=0.0 2024-08-21 00:54:26,010 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 28 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 00:54:33,007 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5028090.0, ans=0.125 2024-08-21 00:54:49,434 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13850, loss[loss=0.1174, beats_loss=0.00911, ecapa_loss=0.0001614, whisper_loss=0.1066, over 23327.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09002, over 3815832.83 frames. ], batch size: 95, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:54:50,078 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5028190.0, ans=0.125 2024-08-21 00:55:13,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5028290.0, ans=0.125 2024-08-21 00:55:17,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=5028290.0, ans=22.5 2024-08-21 00:55:28,104 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5028390.0, ans=0.2 2024-08-21 00:55:58,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.193e+01 2.425e+01 2.661e+01 8.724e+01, threshold=4.850e+01, percent-clipped=1.0 2024-08-21 00:56:06,999 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5028590.0, ans=0.04949747468305833 2024-08-21 00:56:10,589 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5028590.0, ans=0.0 2024-08-21 00:56:17,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5028590.0, ans=0.125 2024-08-21 00:56:19,670 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=22.5 2024-08-21 00:56:25,530 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13900, loss[loss=0.09993, beats_loss=0.0101, ecapa_loss=0.0001568, whisper_loss=0.08826, over 14310.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001409, whisper_loss=0.09053, over 3809555.97 frames. ], batch size: 60, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:56:49,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5028790.0, ans=0.0 2024-08-21 00:56:55,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5028790.0, ans=0.0 2024-08-21 00:57:04,310 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 36 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-21 00:57:46,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5029090.0, ans=0.0 2024-08-21 00:57:53,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-21 00:57:53,873 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 32 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 00:57:57,548 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 13950, loss[loss=0.09729, beats_loss=0.01099, ecapa_loss=0.0001115, whisper_loss=0.08519, over 23108.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01027, ecapa_loss=0.0001411, whisper_loss=0.09127, over 3822896.31 frames. ], batch size: 88, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 00:58:11,424 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 21 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-21 00:58:14,437 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5029190.0, ans=0.09899494936611666 2024-08-21 00:58:14,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2024-08-21 00:58:28,339 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 00:58:32,193 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 31 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-21 00:58:32,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5029290.0, ans=0.0 2024-08-21 00:58:41,718 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 00:58:51,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5029390.0, ans=0.1 2024-08-21 00:58:59,922 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 00:59:02,937 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2024-08-21 00:59:12,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.187e+01 2.492e+01 2.707e+01 4.586e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 00:59:13,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5029490.0, ans=0.1 2024-08-21 00:59:14,701 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 00:59:15,577 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2024-08-21 00:59:19,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=15.0 2024-08-21 00:59:28,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5029590.0, ans=0.0 2024-08-21 00:59:43,119 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14000, loss[loss=0.1094, beats_loss=0.009566, ecapa_loss=0.0001266, whisper_loss=0.09853, over 20154.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.0001409, whisper_loss=0.09113, over 3828939.43 frames. ], batch size: 78, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:00:07,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5029790.0, ans=0.025 2024-08-21 01:00:25,908 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-21 01:00:30,749 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5029890.0, ans=0.2 2024-08-21 01:00:36,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5029890.0, ans=0.0 2024-08-21 01:01:10,841 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5030090.0, ans=0.125 2024-08-21 01:01:14,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5030090.0, ans=0.0 2024-08-21 01:01:22,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5030090.0, ans=0.125 2024-08-21 01:01:30,780 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14050, loss[loss=0.1026, beats_loss=0.01005, ecapa_loss=0.0001834, whisper_loss=0.09068, over 22028.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01023, ecapa_loss=0.0001414, whisper_loss=0.09115, over 3809809.83 frames. ], batch size: 95, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:01:36,343 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 01:01:58,503 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5030290.0, ans=0.125 2024-08-21 01:02:04,309 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 01:02:42,135 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.263e+01 2.516e+01 2.757e+01 1.194e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-21 01:03:13,524 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14100, loss[loss=0.07865, beats_loss=0.01427, ecapa_loss=0.00012, whisper_loss=0.06318, over 22001.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01027, ecapa_loss=0.0001395, whisper_loss=0.09102, over 3831902.67 frames. ], batch size: 91, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:03:19,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5030690.0, ans=0.1 2024-08-21 01:03:27,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5030690.0, ans=0.125 2024-08-21 01:04:02,393 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 01:04:06,263 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 01:04:12,241 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 01:04:14,527 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5030990.0, ans=0.025 2024-08-21 01:04:27,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5030990.0, ans=0.2 2024-08-21 01:04:50,691 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14150, loss[loss=0.1142, beats_loss=0.007127, ecapa_loss=0.0001645, whisper_loss=0.1054, over 19421.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01028, ecapa_loss=0.0001388, whisper_loss=0.09118, over 3848752.00 frames. ], batch size: 79, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:05:02,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5031190.0, ans=0.125 2024-08-21 01:05:21,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5031290.0, ans=0.125 2024-08-21 01:05:37,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5031390.0, ans=0.0 2024-08-21 01:05:45,742 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-21 01:05:54,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5031490.0, ans=0.125 2024-08-21 01:06:02,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.285e+01 2.566e+01 2.926e+01 5.021e+02, threshold=5.132e+01, percent-clipped=5.0 2024-08-21 01:06:12,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=5031590.0, ans=12.0 2024-08-21 01:06:16,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5031590.0, ans=0.125 2024-08-21 01:06:19,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5031590.0, ans=0.125 2024-08-21 01:06:29,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5031590.0, ans=0.125 2024-08-21 01:06:35,450 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14200, loss[loss=0.09705, beats_loss=0.01128, ecapa_loss=0.0001311, whisper_loss=0.08447, over 19255.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01028, ecapa_loss=0.0001383, whisper_loss=0.0913, over 3862122.25 frames. ], batch size: 77, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:06:38,829 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 31 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 01:06:56,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5031790.0, ans=0.125 2024-08-21 01:06:59,970 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-21 01:07:01,702 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 01:07:03,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5031790.0, ans=0.125 2024-08-21 01:07:05,691 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5031790.0, ans=0.125 2024-08-21 01:07:13,421 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=5031890.0, ans=0.5 2024-08-21 01:07:19,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5031890.0, ans=0.125 2024-08-21 01:07:21,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5031890.0, ans=0.0 2024-08-21 01:07:28,126 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:07:33,893 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-21 01:08:03,725 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=5032090.0, ans=0.1 2024-08-21 01:08:08,708 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 01:08:09,672 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14250, loss[loss=0.1024, beats_loss=0.01, ecapa_loss=0.0001441, whisper_loss=0.09091, over 22234.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01027, ecapa_loss=0.0001386, whisper_loss=0.09131, over 3887900.00 frames. ], batch size: 91, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:08:25,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5032190.0, ans=0.125 2024-08-21 01:08:30,675 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 17 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-21 01:09:00,351 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 21 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-21 01:09:07,935 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5032490.0, ans=0.125 2024-08-21 01:09:11,281 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 01:09:16,015 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2024-08-21 01:09:16,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.235e+01 2.454e+01 2.805e+01 6.761e+01, threshold=4.908e+01, percent-clipped=2.0 2024-08-21 01:09:42,151 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14300, loss[loss=0.108, beats_loss=0.007346, ecapa_loss=0.0001976, whisper_loss=0.09865, over 21306.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01026, ecapa_loss=0.0001381, whisper_loss=0.09066, over 3862263.07 frames. ], batch size: 93, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:09:54,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5032690.0, ans=0.0 2024-08-21 01:10:06,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5032790.0, ans=0.1 2024-08-21 01:10:09,914 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 23 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-21 01:10:11,572 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 01:10:15,273 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 01:10:19,720 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-21 01:10:22,747 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 13 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 01:10:26,619 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-21 01:10:27,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5032890.0, ans=0.0 2024-08-21 01:10:39,243 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-21 01:11:03,921 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 01:11:04,494 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.65 vs. limit=6.0 2024-08-21 01:11:18,142 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14350, loss[loss=0.09885, beats_loss=0.01153, ecapa_loss=0.0001453, whisper_loss=0.08587, over 20609.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01025, ecapa_loss=0.0001387, whisper_loss=0.09028, over 3824132.94 frames. ], batch size: 83, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:11:23,234 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 01:12:04,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5033390.0, ans=0.125 2024-08-21 01:12:05,250 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-21 01:12:07,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5033390.0, ans=0.0 2024-08-21 01:12:16,552 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 01:12:16,793 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:12:18,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5033490.0, ans=0.2 2024-08-21 01:12:23,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5033490.0, ans=0.09899494936611666 2024-08-21 01:12:24,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.295e+01 2.538e+01 2.825e+01 4.751e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 01:12:36,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5033590.0, ans=0.0 2024-08-21 01:12:48,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5033690.0, ans=0.125 2024-08-21 01:12:49,773 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14400, loss[loss=0.09957, beats_loss=0.0125, ecapa_loss=0.0001268, whisper_loss=0.0858, over 22921.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.08997, over 3829861.71 frames. ], batch size: 94, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:12:50,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5033690.0, ans=0.2 2024-08-21 01:13:06,413 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2024-08-21 01:13:36,777 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-21 01:13:53,503 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-21 01:14:00,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2024-08-21 01:14:13,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5034090.0, ans=0.125 2024-08-21 01:14:13,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5034090.0, ans=0.125 2024-08-21 01:14:25,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5034090.0, ans=0.125 2024-08-21 01:14:32,004 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14450, loss[loss=0.1052, beats_loss=0.0103, ecapa_loss=0.0001504, whisper_loss=0.09342, over 22917.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001378, whisper_loss=0.08948, over 3795714.86 frames. ], batch size: 92, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:14:33,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5034190.0, ans=0.125 2024-08-21 01:14:35,210 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 12 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 01:14:40,376 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 01:14:42,695 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5034190.0, ans=0.125 2024-08-21 01:14:44,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5034190.0, ans=0.035 2024-08-21 01:14:48,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5034290.0, ans=0.07 2024-08-21 01:15:40,195 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.267e+01 2.442e+01 2.819e+01 1.713e+02, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 01:15:48,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5034590.0, ans=0.125 2024-08-21 01:16:05,791 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14500, loss[loss=0.1024, beats_loss=0.009768, ecapa_loss=0.000134, whisper_loss=0.09128, over 18566.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01036, ecapa_loss=0.0001374, whisper_loss=0.08952, over 3789186.77 frames. ], batch size: 74, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:16:16,762 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 01:16:49,266 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5034890.0, ans=0.0 2024-08-21 01:17:07,113 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-21 01:17:12,479 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-21 01:17:22,877 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 01:17:36,776 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 01:17:44,107 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14550, loss[loss=0.08264, beats_loss=0.01169, ecapa_loss=0.0001489, whisper_loss=0.06946, over 21313.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001378, whisper_loss=0.08987, over 3786639.80 frames. ], batch size: 88, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:17:48,051 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 01:17:53,608 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5035190.0, ans=0.125 2024-08-21 01:18:17,049 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-21 01:18:32,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5035390.0, ans=0.1 2024-08-21 01:18:45,326 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 01:18:51,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5035490.0, ans=0.125 2024-08-21 01:18:54,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.274e+01 2.528e+01 2.801e+01 4.515e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-21 01:19:06,097 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 01:19:06,678 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-21 01:19:21,680 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14600, loss[loss=0.07078, beats_loss=0.01148, ecapa_loss=0.0001291, whisper_loss=0.05801, over 12450.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001379, whisper_loss=0.08976, over 3823305.93 frames. ], batch size: 50, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:19:37,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5035690.0, ans=0.2 2024-08-21 01:19:39,882 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.03 vs. limit=22.5 2024-08-21 01:20:36,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5035990.0, ans=0.0 2024-08-21 01:20:56,984 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14650, loss[loss=0.1207, beats_loss=0.008026, ecapa_loss=0.0001479, whisper_loss=0.1112, over 22615.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.0001386, whisper_loss=0.08963, over 3826760.63 frames. ], batch size: 85, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:21:04,373 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.30 vs. limit=10.0 2024-08-21 01:21:08,154 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 01:21:30,396 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 01:21:32,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2024-08-21 01:21:37,473 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 01:21:50,081 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-21 01:21:52,623 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 01:21:58,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5036490.0, ans=0.2 2024-08-21 01:22:01,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5036490.0, ans=0.2 2024-08-21 01:22:02,499 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 01:22:06,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.288e+01 2.569e+01 2.805e+01 8.601e+01, threshold=5.137e+01, percent-clipped=2.0 2024-08-21 01:22:22,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5036590.0, ans=0.125 2024-08-21 01:22:32,566 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14700, loss[loss=0.08059, beats_loss=0.0144, ecapa_loss=0.0001595, whisper_loss=0.0646, over 12885.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001387, whisper_loss=0.08992, over 3818726.72 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:22:40,066 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.21 vs. limit=10.0 2024-08-21 01:22:45,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5036690.0, ans=0.5 2024-08-21 01:23:23,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=5036890.0, ans=0.2 2024-08-21 01:23:27,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5036990.0, ans=0.125 2024-08-21 01:23:42,803 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5036990.0, ans=0.2 2024-08-21 01:23:48,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5037090.0, ans=0.125 2024-08-21 01:24:08,875 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14750, loss[loss=0.09967, beats_loss=0.01159, ecapa_loss=0.0001477, whisper_loss=0.0866, over 18380.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.0891, over 3798143.77 frames. ], batch size: 74, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:24:19,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5037190.0, ans=0.125 2024-08-21 01:24:22,787 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5037190.0, ans=0.0 2024-08-21 01:24:38,870 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 19 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-21 01:24:45,658 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.080e+01 2024-08-21 01:25:03,427 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 9 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-21 01:25:15,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5037490.0, ans=0.125 2024-08-21 01:25:20,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.178e+01 2.451e+01 2.819e+01 4.132e+01, threshold=4.902e+01, percent-clipped=0.0 2024-08-21 01:25:46,286 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5037690.0, ans=0.2 2024-08-21 01:25:47,090 INFO [train_multi_KD3.py:1117] (1/4) Epoch 34, batch 14800, loss[loss=0.1038, beats_loss=0.009328, ecapa_loss=0.0001279, whisper_loss=0.09317, over 16589.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.08896, over 3806287.81 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:25:54,141 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:26:25,276 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 0, loss[loss=0.09163, beats_loss=0.01013, ecapa_loss=0.0001506, whisper_loss=0.08, over 19036.00 frames. ], tot_loss[loss=0.09163, beats_loss=0.01013, ecapa_loss=0.0001506, whisper_loss=0.08, over 19036.00 frames. ], batch size: 79, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:26:25,277 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-21 01:27:00,167 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2488, over 931116.00 frames. 2024-08-21 01:27:14,417 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1891, 2.7481, 2.5669, 2.2328], device='cuda:1') 2024-08-21 01:27:22,667 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on SV_voxceleb1: loss=0.003936, beats_loss=0, ecapa_loss=0.0003936, whisper_loss=0, over 944235.00 frames. 2024-08-21 01:28:47,658 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6730, 3.8448, 4.4201, 4.4833], device='cuda:1') 2024-08-21 01:28:59,216 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 01:28:59,219 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-21 01:29:34,234 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 01:29:34,807 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-21 01:29:44,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5037850.0, ans=0.0 2024-08-21 01:30:24,883 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 34 from Vox, 25 fro AS 2024-08-21 01:30:25,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5038050.0, ans=0.125 2024-08-21 01:30:36,916 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 01:30:38,883 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 25 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-21 01:30:46,062 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 01:30:48,820 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5038150.0, ans=0.125 2024-08-21 01:31:05,018 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 50, loss[loss=0.06769, beats_loss=0.01285, ecapa_loss=0.0001444, whisper_loss=0.05339, over 21488.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009153, ecapa_loss=0.0001456, whisper_loss=0.09069, over 870693.81 frames. ], batch size: 91, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:31:05,469 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-21 01:31:07,659 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-21 01:31:10,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5038250.0, ans=0.1 2024-08-21 01:31:14,564 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-21 01:31:42,707 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5038350.0, ans=0.2 2024-08-21 01:32:05,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5038450.0, ans=0.125 2024-08-21 01:32:15,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5038450.0, ans=0.1 2024-08-21 01:32:18,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.525e+01 2.864e+01 3.213e+01 4.437e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-21 01:32:28,796 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 01:32:48,823 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 26 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-21 01:32:59,434 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5038650.0, ans=0.1 2024-08-21 01:33:02,154 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5038650.0, ans=0.125 2024-08-21 01:33:10,140 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-21 01:33:13,879 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 100, loss[loss=0.09579, beats_loss=0.01036, ecapa_loss=0.0001558, whisper_loss=0.08388, over 21481.00 frames. ], tot_loss[loss=0.09832, beats_loss=0.009186, ecapa_loss=0.0001421, whisper_loss=0.08772, over 1557336.27 frames. ], batch size: 90, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:33:21,545 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-21 01:33:26,111 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 01:33:45,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5038850.0, ans=0.0 2024-08-21 01:33:48,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5038850.0, ans=0.0 2024-08-21 01:33:52,694 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-21 01:34:07,046 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5038950.0, ans=0.05 2024-08-21 01:34:15,445 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5038950.0, ans=0.09899494936611666 2024-08-21 01:34:26,159 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-21 01:35:01,248 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5039150.0, ans=0.2 2024-08-21 01:35:11,738 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2024-08-21 01:35:19,886 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 150, loss[loss=0.09408, beats_loss=0.01118, ecapa_loss=0.0001351, whisper_loss=0.08156, over 18769.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009046, ecapa_loss=0.000143, whisper_loss=0.08979, over 2033335.15 frames. ], batch size: 76, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:35:21,744 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5039250.0, ans=0.0 2024-08-21 01:35:36,849 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 01:35:46,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5039350.0, ans=0.09899494936611666 2024-08-21 01:36:06,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5039450.0, ans=0.125 2024-08-21 01:36:16,731 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:36:25,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.462e+01 2.718e+01 2.997e+01 1.008e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-21 01:36:36,602 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 15 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 01:36:40,365 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2024-08-21 01:36:46,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5039650.0, ans=0.1 2024-08-21 01:36:55,982 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5039650.0, ans=0.05 2024-08-21 01:37:08,661 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 200, loss[loss=0.075, beats_loss=0.007466, ecapa_loss=0.0002445, whisper_loss=0.06508, over 15528.00 frames. ], tot_loss[loss=0.09982, beats_loss=0.00946, ecapa_loss=0.000142, whisper_loss=0.08894, over 2392385.58 frames. ], batch size: 64, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:37:09,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5039750.0, ans=0.0 2024-08-21 01:37:15,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5039750.0, ans=0.0 2024-08-21 01:37:22,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5039750.0, ans=0.125 2024-08-21 01:37:29,176 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 01:37:31,525 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5039850.0, ans=0.125 2024-08-21 01:37:56,724 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5039950.0, ans=0.125 2024-08-21 01:37:57,069 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-08-21 01:38:19,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5040050.0, ans=0.125 2024-08-21 01:38:41,740 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 250, loss[loss=0.1137, beats_loss=0.01104, ecapa_loss=0.0001698, whisper_loss=0.101, over 20382.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.009665, ecapa_loss=0.0001407, whisper_loss=0.09013, over 2692360.33 frames. ], batch size: 86, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:38:55,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5040250.0, ans=0.125 2024-08-21 01:38:56,112 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2024-08-21 01:38:57,159 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 22 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-21 01:39:07,508 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5040350.0, ans=0.125 2024-08-21 01:39:09,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5040350.0, ans=0.2 2024-08-21 01:39:28,884 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-21 01:39:32,593 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5040450.0, ans=0.125 2024-08-21 01:39:34,951 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-21 01:39:39,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.294e+01 2.516e+01 2.828e+01 4.079e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 01:39:46,014 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5040550.0, ans=0.0 2024-08-21 01:40:10,758 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 16 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-21 01:40:17,471 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 300, loss[loss=0.1008, beats_loss=0.0106, ecapa_loss=0.000148, whisper_loss=0.08872, over 15787.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009922, ecapa_loss=0.0001394, whisper_loss=0.08932, over 2919840.39 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:40:44,690 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 28 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-21 01:41:24,133 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 15 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-21 01:41:35,416 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5041150.0, ans=0.125 2024-08-21 01:41:41,027 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5041150.0, ans=0.2 2024-08-21 01:41:49,919 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 24 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-21 01:41:51,468 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 350, loss[loss=0.1172, beats_loss=0.008925, ecapa_loss=0.0001406, whisper_loss=0.1068, over 15388.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01004, ecapa_loss=0.0001381, whisper_loss=0.08845, over 3092729.92 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:41:55,739 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5041250.0, ans=0.2 2024-08-21 01:42:06,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5041250.0, ans=0.125 2024-08-21 01:42:15,925 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 01:42:32,102 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 34 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-21 01:42:39,266 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 01:42:43,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.323e+01 2.496e+01 2.850e+01 5.461e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-21 01:42:46,671 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5041550.0, ans=0.0 2024-08-21 01:42:46,916 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-21 01:42:51,418 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5041550.0, ans=0.2 2024-08-21 01:42:56,453 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 19 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-21 01:43:00,892 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2024-08-21 01:43:19,304 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 400, loss[loss=0.1111, beats_loss=0.01049, ecapa_loss=0.0001148, whisper_loss=0.09948, over 19107.00 frames. ], tot_loss[loss=0.09972, beats_loss=0.01011, ecapa_loss=0.0001384, whisper_loss=0.08822, over 3203551.33 frames. ], batch size: 72, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:44:18,736 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-21 01:44:50,434 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 450, loss[loss=0.07114, beats_loss=0.01354, ecapa_loss=0.000106, whisper_loss=0.05655, over 18430.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0101, ecapa_loss=0.000138, whisper_loss=0.08904, over 3336511.62 frames. ], batch size: 72, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:45:00,169 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 01:45:09,075 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 01:45:12,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=5042350.0, ans=10.0 2024-08-21 01:45:17,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5042350.0, ans=0.125 2024-08-21 01:45:43,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.267e+01 2.494e+01 2.807e+01 3.587e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-21 01:45:48,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5042550.0, ans=0.2 2024-08-21 01:45:49,520 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 01:46:20,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5042750.0, ans=0.125 2024-08-21 01:46:20,986 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 500, loss[loss=0.09314, beats_loss=0.01073, ecapa_loss=0.0001748, whisper_loss=0.08066, over 20323.00 frames. ], tot_loss[loss=0.09959, beats_loss=0.01012, ecapa_loss=0.0001389, whisper_loss=0.08808, over 3435160.41 frames. ], batch size: 85, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:46:22,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-21 01:46:49,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=5042850.0, ans=15.0 2024-08-21 01:47:00,662 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 01:47:19,814 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:47:38,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5043150.0, ans=0.2 2024-08-21 01:47:58,083 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 550, loss[loss=0.09434, beats_loss=0.01348, ecapa_loss=0.0001025, whisper_loss=0.07983, over 22304.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01013, ecapa_loss=0.0001388, whisper_loss=0.08846, over 3537042.06 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:48:03,098 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=5043250.0, ans=15.0 2024-08-21 01:48:13,758 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 01:48:21,499 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-21 01:48:43,104 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 36 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 01:48:54,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.277e+01 2.484e+01 2.909e+01 4.062e+02, threshold=4.967e+01, percent-clipped=2.0 2024-08-21 01:49:18,039 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-21 01:49:29,161 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-21 01:49:30,486 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 600, loss[loss=0.09986, beats_loss=0.008787, ecapa_loss=0.0001609, whisper_loss=0.08947, over 16191.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01014, ecapa_loss=0.0001384, whisper_loss=0.08906, over 3634215.63 frames. ], batch size: 66, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:49:31,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5043750.0, ans=0.125 2024-08-21 01:49:57,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5043850.0, ans=0.0 2024-08-21 01:50:09,305 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 01:50:13,643 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-21 01:51:00,062 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 650, loss[loss=0.1198, beats_loss=0.009265, ecapa_loss=0.0001217, whisper_loss=0.1094, over 18268.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01015, ecapa_loss=0.0001366, whisper_loss=0.08971, over 3684105.39 frames. ], batch size: 68, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:51:09,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5044250.0, ans=0.0 2024-08-21 01:51:25,447 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 01:51:39,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5044450.0, ans=0.125 2024-08-21 01:51:43,757 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 01:51:51,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.231e+01 2.431e+01 2.762e+01 3.963e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-21 01:51:59,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5044550.0, ans=0.0 2024-08-21 01:52:13,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5044650.0, ans=0.125 2024-08-21 01:52:27,916 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 700, loss[loss=0.1121, beats_loss=0.00841, ecapa_loss=0.0001583, whisper_loss=0.1021, over 17290.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01015, ecapa_loss=0.0001373, whisper_loss=0.0898, over 3690986.72 frames. ], batch size: 70, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:52:30,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5044750.0, ans=0.1 2024-08-21 01:52:38,897 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 01:52:51,678 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 01:53:14,769 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-21 01:53:17,763 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 15 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-21 01:53:17,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5044950.0, ans=0.0 2024-08-21 01:53:34,247 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 01:53:38,704 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2024-08-21 01:53:54,483 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-08-21 01:53:56,826 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 750, loss[loss=0.09504, beats_loss=0.008919, ecapa_loss=0.0001397, whisper_loss=0.08472, over 21186.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01025, ecapa_loss=0.0001385, whisper_loss=0.08865, over 3713452.33 frames. ], batch size: 84, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:53:59,543 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-21 01:54:22,606 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.30 vs. limit=10.0 2024-08-21 01:54:26,873 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-21 01:54:29,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5045350.0, ans=0.0 2024-08-21 01:54:30,114 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 01:54:30,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5045450.0, ans=0.125 2024-08-21 01:54:49,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.245e+01 2.497e+01 2.753e+01 9.624e+01, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 01:54:58,889 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:55:25,999 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 800, loss[loss=0.1064, beats_loss=0.01193, ecapa_loss=0.0001166, whisper_loss=0.09327, over 22070.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01024, ecapa_loss=0.0001385, whisper_loss=0.08889, over 3710537.90 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:55:26,180 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 01:55:27,985 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5045750.0, ans=0.125 2024-08-21 01:55:28,671 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2024-08-21 01:55:48,843 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-21 01:55:56,533 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5045850.0, ans=0.07 2024-08-21 01:56:14,448 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=5045950.0, ans=0.1 2024-08-21 01:56:25,354 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-21 01:56:31,554 WARNING [optim.py:496] (1/4) Scaling gradients by 0.057700227946043015, model_norm_threshold=49.945823669433594 2024-08-21 01:56:31,715 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.361e+05, grad_sumsq=1.361e+05, orig_rms_sq=1.000e+00 2024-08-21 01:56:41,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5046150.0, ans=0.1 2024-08-21 01:56:51,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5046150.0, ans=0.2 2024-08-21 01:56:53,837 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 850, loss[loss=0.1165, beats_loss=0.007448, ecapa_loss=0.0001576, whisper_loss=0.1075, over 15773.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01019, ecapa_loss=0.0001379, whisper_loss=0.08924, over 3717707.06 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:57:00,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5046250.0, ans=0.125 2024-08-21 01:57:07,729 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5046250.0, ans=0.0 2024-08-21 01:57:39,395 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 01:57:48,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.265e+01 2.514e+01 2.854e+01 8.656e+02, threshold=5.028e+01, percent-clipped=3.0 2024-08-21 01:58:05,551 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 01:58:09,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5046650.0, ans=0.1 2024-08-21 01:58:25,199 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 900, loss[loss=0.1094, beats_loss=0.008843, ecapa_loss=0.0001543, whisper_loss=0.09904, over 23118.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01018, ecapa_loss=0.0001378, whisper_loss=0.08932, over 3764453.09 frames. ], batch size: 94, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:58:44,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5046850.0, ans=0.125 2024-08-21 01:58:51,373 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-21 01:59:29,535 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5047050.0, ans=0.0 2024-08-21 01:59:55,747 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 950, loss[loss=0.09246, beats_loss=0.01193, ecapa_loss=0.0001083, whisper_loss=0.07945, over 15661.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01018, ecapa_loss=0.0001372, whisper_loss=0.08971, over 3782443.67 frames. ], batch size: 59, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:00:09,820 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 02:00:26,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5047350.0, ans=0.125 2024-08-21 02:00:29,801 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5047450.0, ans=0.125 2024-08-21 02:00:47,143 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 02:00:48,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.167e+01 2.360e+01 2.609e+01 1.184e+02, threshold=4.721e+01, percent-clipped=1.0 2024-08-21 02:01:04,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5047650.0, ans=0.0 2024-08-21 02:01:04,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5047650.0, ans=0.95 2024-08-21 02:01:07,664 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-21 02:01:10,299 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-21 02:01:23,452 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1000, loss[loss=0.1015, beats_loss=0.009916, ecapa_loss=0.0001327, whisper_loss=0.0903, over 14247.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01028, ecapa_loss=0.0001366, whisper_loss=0.08875, over 3764661.18 frames. ], batch size: 52, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:01:26,980 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 02:01:37,775 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5047750.0, ans=0.125 2024-08-21 02:01:39,273 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 02:01:40,959 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 11 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 02:01:46,317 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 02:01:55,131 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-21 02:02:11,736 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-08-21 02:02:22,311 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.06 vs. limit=10.0 2024-08-21 02:02:35,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5048150.0, ans=10.0 2024-08-21 02:02:40,822 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5048150.0, ans=0.125 2024-08-21 02:02:42,455 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 02:02:48,578 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5048150.0, ans=0.1 2024-08-21 02:02:53,582 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1050, loss[loss=0.1366, beats_loss=0.009717, ecapa_loss=0.0001454, whisper_loss=0.1254, over 21051.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01028, ecapa_loss=0.0001357, whisper_loss=0.08891, over 3809103.56 frames. ], batch size: 81, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:03:13,181 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:03:23,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5048350.0, ans=0.125 2024-08-21 02:03:23,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.32 vs. limit=10.0 2024-08-21 02:03:30,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5048450.0, ans=0.1 2024-08-21 02:03:41,904 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 02:03:50,213 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.332e+01 2.561e+01 2.821e+01 8.058e+01, threshold=5.122e+01, percent-clipped=2.0 2024-08-21 02:04:27,045 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1100, loss[loss=0.106, beats_loss=0.01146, ecapa_loss=0.0001113, whisper_loss=0.0934, over 22349.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01029, ecapa_loss=0.0001351, whisper_loss=0.0894, over 3798476.72 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:04:44,120 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5048850.0, ans=0.125 2024-08-21 02:05:06,460 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-21 02:05:10,289 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 28 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-21 02:05:10,644 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.070e+00 2024-08-21 02:05:32,070 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5049050.0, ans=0.0 2024-08-21 02:05:52,735 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5049150.0, ans=0.1 2024-08-21 02:05:55,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5049150.0, ans=0.125 2024-08-21 02:05:58,277 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1150, loss[loss=0.09915, beats_loss=0.01289, ecapa_loss=0.0001061, whisper_loss=0.0852, over 18105.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01024, ecapa_loss=0.0001357, whisper_loss=0.09015, over 3806705.33 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:06:25,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5049350.0, ans=0.125 2024-08-21 02:06:45,758 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 02:06:50,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.348e+01 2.579e+01 2.822e+01 4.118e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-21 02:06:58,348 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5049550.0, ans=0.2 2024-08-21 02:07:12,513 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2024-08-21 02:07:14,230 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-21 02:07:24,517 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5049750.0, ans=0.0 2024-08-21 02:07:25,359 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1200, loss[loss=0.1056, beats_loss=0.01034, ecapa_loss=0.0001122, whisper_loss=0.09413, over 22530.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01016, ecapa_loss=0.0001358, whisper_loss=0.09064, over 3764862.70 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:07:47,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5049850.0, ans=0.125 2024-08-21 02:07:51,660 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 02:08:14,074 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-21 02:08:15,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5050050.0, ans=0.0 2024-08-21 02:08:50,143 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-21 02:08:52,680 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1250, loss[loss=0.08583, beats_loss=0.01403, ecapa_loss=0.0001287, whisper_loss=0.07051, over 20850.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01026, ecapa_loss=0.0001356, whisper_loss=0.08966, over 3754008.72 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:08:54,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5050250.0, ans=0.125 2024-08-21 02:09:11,839 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-21 02:09:13,033 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 17 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 02:09:17,209 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2024-08-21 02:09:20,008 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 26 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-21 02:09:36,480 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 02:09:41,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5050450.0, ans=0.125 2024-08-21 02:09:46,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.192e+01 2.364e+01 2.564e+01 4.097e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-21 02:10:00,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5050550.0, ans=0.125 2024-08-21 02:10:00,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5050550.0, ans=0.07 2024-08-21 02:10:02,054 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=5050550.0, ans=0.02 2024-08-21 02:10:07,474 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5050650.0, ans=0.2 2024-08-21 02:10:23,327 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1300, loss[loss=0.1149, beats_loss=0.01044, ecapa_loss=0.0001294, whisper_loss=0.1032, over 21176.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01031, ecapa_loss=0.0001354, whisper_loss=0.08906, over 3758121.70 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:10:28,160 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=12.0 2024-08-21 02:10:56,388 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 02:11:08,114 WARNING [optim.py:496] (1/4) Scaling gradients by 0.01577102579176426, model_norm_threshold=47.28926467895508 2024-08-21 02:11:08,274 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.852e+06, grad_sumsq=2.852e+06, orig_rms_sq=1.000e+00 2024-08-21 02:11:13,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5050950.0, ans=0.0 2024-08-21 02:11:25,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5051050.0, ans=0.125 2024-08-21 02:11:47,349 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5051150.0, ans=0.125 2024-08-21 02:11:51,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5051250.0, ans=0.125 2024-08-21 02:11:52,003 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1350, loss[loss=0.1062, beats_loss=0.008908, ecapa_loss=0.0001303, whisper_loss=0.09595, over 14868.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01032, ecapa_loss=0.0001357, whisper_loss=0.08892, over 3774316.91 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:12:27,488 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:12:47,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.226e+01 2.529e+01 2.867e+01 2.998e+03, threshold=5.057e+01, percent-clipped=1.0 2024-08-21 02:13:01,041 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 02:13:21,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5051750.0, ans=0.0 2024-08-21 02:13:23,046 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1400, loss[loss=0.09918, beats_loss=0.008627, ecapa_loss=0.0001959, whisper_loss=0.08859, over 13613.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01026, ecapa_loss=0.0001368, whisper_loss=0.08893, over 3739995.37 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:13:29,994 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 02:13:37,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5051750.0, ans=0.125 2024-08-21 02:14:01,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5051950.0, ans=0.125 2024-08-21 02:14:55,676 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1450, loss[loss=0.09043, beats_loss=0.00972, ecapa_loss=0.0001208, whisper_loss=0.0795, over 13253.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01021, ecapa_loss=0.0001366, whisper_loss=0.08936, over 3727371.49 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:15:01,761 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-21 02:15:14,157 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5052350.0, ans=0.125 2024-08-21 02:15:14,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5052350.0, ans=0.125 2024-08-21 02:15:31,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5052450.0, ans=0.0 2024-08-21 02:15:37,035 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5052450.0, ans=0.125 2024-08-21 02:15:49,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.283e+01 2.598e+01 2.871e+01 4.818e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-21 02:15:49,607 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 30 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 02:15:51,130 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 12 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 02:16:42,134 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1500, loss[loss=0.1185, beats_loss=0.009188, ecapa_loss=0.0001267, whisper_loss=0.108, over 23325.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01027, ecapa_loss=0.0001361, whisper_loss=0.08862, over 3719636.82 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:17:16,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5052850.0, ans=0.2 2024-08-21 02:17:30,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5052950.0, ans=0.125 2024-08-21 02:17:44,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5053050.0, ans=0.125 2024-08-21 02:17:46,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5053050.0, ans=0.1 2024-08-21 02:17:52,170 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 33 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 02:17:58,012 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 02:17:58,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5053150.0, ans=0.0 2024-08-21 02:18:16,706 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1550, loss[loss=0.112, beats_loss=0.008445, ecapa_loss=0.0001283, whisper_loss=0.1023, over 20560.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01027, ecapa_loss=0.0001349, whisper_loss=0.08832, over 3735812.35 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:18:30,640 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:18:52,425 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 02:18:54,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5053450.0, ans=0.125 2024-08-21 02:19:13,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.172e+01 2.388e+01 2.761e+01 1.037e+02, threshold=4.777e+01, percent-clipped=1.0 2024-08-21 02:19:15,346 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 02:19:19,217 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 02:19:26,719 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 02:19:37,848 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-21 02:19:41,691 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:19:43,462 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 02:19:50,431 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1600, loss[loss=0.1153, beats_loss=0.008795, ecapa_loss=0.0001189, whisper_loss=0.1053, over 19878.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01022, ecapa_loss=0.0001363, whisper_loss=0.08887, over 3731365.50 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:19:54,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5053750.0, ans=0.125 2024-08-21 02:20:07,585 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-08-21 02:20:08,442 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5053850.0, ans=0.125 2024-08-21 02:20:19,121 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 02:20:37,666 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-21 02:20:41,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5053950.0, ans=0.2 2024-08-21 02:21:04,454 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 02:21:13,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5054150.0, ans=0.125 2024-08-21 02:21:20,726 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1650, loss[loss=0.1058, beats_loss=0.01081, ecapa_loss=0.0001202, whisper_loss=0.09382, over 17526.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01018, ecapa_loss=0.0001373, whisper_loss=0.08905, over 3732799.63 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:21:21,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5054250.0, ans=0.125 2024-08-21 02:22:08,935 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.76 vs. limit=6.0 2024-08-21 02:22:11,519 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 37 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 02:22:12,118 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=5054450.0, ans=0.5 2024-08-21 02:22:14,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.289e+01 2.512e+01 2.829e+01 4.001e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 02:22:17,644 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5054550.0, ans=0.125 2024-08-21 02:22:23,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5054550.0, ans=0.0 2024-08-21 02:22:35,925 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5054650.0, ans=0.0 2024-08-21 02:22:36,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=5054650.0, ans=15.0 2024-08-21 02:22:39,118 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 02:22:39,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5054650.0, ans=0.1 2024-08-21 02:22:51,665 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1700, loss[loss=0.0873, beats_loss=0.008197, ecapa_loss=0.0001604, whisper_loss=0.0775, over 14740.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01019, ecapa_loss=0.0001375, whisper_loss=0.08936, over 3726856.15 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:22:57,754 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-21 02:23:14,636 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5054850.0, ans=0.0 2024-08-21 02:23:18,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5054850.0, ans=0.1 2024-08-21 02:23:28,786 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 25 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-21 02:23:34,242 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-21 02:24:13,049 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 17 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-21 02:24:15,119 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5055150.0, ans=0.1 2024-08-21 02:24:19,010 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-21 02:24:20,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5055150.0, ans=0.0 2024-08-21 02:24:22,662 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 02:24:23,801 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1750, loss[loss=0.09602, beats_loss=0.01175, ecapa_loss=0.0001473, whisper_loss=0.0828, over 20351.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01019, ecapa_loss=0.0001373, whisper_loss=0.08976, over 3708527.57 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:24:26,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5055250.0, ans=0.07 2024-08-21 02:24:27,935 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 02:24:41,425 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-21 02:24:54,356 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5055350.0, ans=0.0 2024-08-21 02:24:54,397 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5055350.0, ans=0.1 2024-08-21 02:24:58,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5055350.0, ans=0.2 2024-08-21 02:25:19,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.242e+01 2.435e+01 2.822e+01 2.727e+02, threshold=4.871e+01, percent-clipped=1.0 2024-08-21 02:25:24,633 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-21 02:25:36,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2024-08-21 02:25:42,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5055650.0, ans=0.0 2024-08-21 02:25:55,135 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1800, loss[loss=0.1038, beats_loss=0.007469, ecapa_loss=0.0001502, whisper_loss=0.09481, over 21252.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01018, ecapa_loss=0.0001371, whisper_loss=0.08978, over 3733622.23 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:26:17,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5055850.0, ans=0.025 2024-08-21 02:26:19,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5055850.0, ans=0.125 2024-08-21 02:26:51,998 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5056050.0, ans=0.2 2024-08-21 02:27:00,142 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 25 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 02:27:11,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5056150.0, ans=0.125 2024-08-21 02:27:14,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5056150.0, ans=0.0 2024-08-21 02:27:26,621 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1850, loss[loss=0.09667, beats_loss=0.01269, ecapa_loss=0.0001288, whisper_loss=0.0827, over 22779.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01024, ecapa_loss=0.0001367, whisper_loss=0.08904, over 3728967.82 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:27:34,079 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5056250.0, ans=0.0 2024-08-21 02:28:00,907 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=5056450.0, ans=10.0 2024-08-21 02:28:02,605 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 02:28:21,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.279e+01 2.495e+01 2.833e+01 4.581e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-21 02:28:42,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5056650.0, ans=0.125 2024-08-21 02:28:58,296 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1900, loss[loss=0.09289, beats_loss=0.009163, ecapa_loss=0.0001328, whisper_loss=0.0824, over 14707.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01025, ecapa_loss=0.0001356, whisper_loss=0.08886, over 3747274.11 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:29:20,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-08-21 02:29:56,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5057050.0, ans=0.05 2024-08-21 02:30:19,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5057150.0, ans=0.0 2024-08-21 02:30:23,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5057150.0, ans=0.125 2024-08-21 02:30:29,920 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 1950, loss[loss=0.1075, beats_loss=0.01025, ecapa_loss=0.0001275, whisper_loss=0.09596, over 19732.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0103, ecapa_loss=0.000135, whisper_loss=0.08875, over 3743473.32 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:30:35,811 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5057250.0, ans=0.125 2024-08-21 02:30:44,692 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 02:30:53,227 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-21 02:31:25,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.215e+01 2.451e+01 2.695e+01 5.295e+01, threshold=4.901e+01, percent-clipped=1.0 2024-08-21 02:31:41,152 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-08-21 02:31:44,897 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5057650.0, ans=0.125 2024-08-21 02:31:46,230 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5057650.0, ans=0.0 2024-08-21 02:31:47,759 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5057650.0, ans=0.125 2024-08-21 02:31:51,425 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 02:31:51,992 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5057650.0, ans=0.0 2024-08-21 02:31:53,615 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 32 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-21 02:31:59,822 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:32:02,170 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2000, loss[loss=0.05157, beats_loss=0.01322, ecapa_loss=0.000112, whisper_loss=0.03723, over 13061.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01037, ecapa_loss=0.0001353, whisper_loss=0.08865, over 3767308.75 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:32:03,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5057750.0, ans=0.125 2024-08-21 02:32:23,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5057850.0, ans=0.0 2024-08-21 02:33:34,308 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2050, loss[loss=0.07551, beats_loss=0.01224, ecapa_loss=0.000131, whisper_loss=0.06196, over 15686.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01041, ecapa_loss=0.0001338, whisper_loss=0.08837, over 3752800.57 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:34:30,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.290e+01 2.506e+01 2.810e+01 1.281e+02, threshold=5.013e+01, percent-clipped=3.0 2024-08-21 02:34:33,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-21 02:34:43,615 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 02:34:49,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5058650.0, ans=0.0 2024-08-21 02:34:53,068 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 02:35:06,269 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2100, loss[loss=0.09616, beats_loss=0.01067, ecapa_loss=0.0001329, whisper_loss=0.08416, over 19441.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01043, ecapa_loss=0.0001339, whisper_loss=0.08826, over 3745998.14 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:35:16,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5058750.0, ans=0.0 2024-08-21 02:35:33,326 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-08-21 02:35:46,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-08-21 02:35:57,839 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5058950.0, ans=0.125 2024-08-21 02:36:21,579 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 02:36:26,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-21 02:36:37,055 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2150, loss[loss=0.1019, beats_loss=0.007912, ecapa_loss=0.0001965, whisper_loss=0.09205, over 14772.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01042, ecapa_loss=0.0001337, whisper_loss=0.08849, over 3742889.22 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:36:37,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-21 02:36:48,004 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-21 02:36:50,194 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5059250.0, ans=0.0 2024-08-21 02:36:50,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5059250.0, ans=0.0 2024-08-21 02:37:35,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.212e+01 2.472e+01 2.786e+01 4.629e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 02:37:41,926 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5059550.0, ans=0.0 2024-08-21 02:37:50,003 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 02:38:09,807 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 02:38:13,157 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2200, loss[loss=0.1008, beats_loss=0.01084, ecapa_loss=0.0001261, whisper_loss=0.08874, over 15045.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01037, ecapa_loss=0.0001336, whisper_loss=0.08853, over 3753981.95 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:38:15,928 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5059750.0, ans=0.0 2024-08-21 02:38:50,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-21 02:38:57,633 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5059950.0, ans=0.125 2024-08-21 02:39:01,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=5059950.0, ans=0.95 2024-08-21 02:39:04,587 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5059950.0, ans=0.0 2024-08-21 02:39:04,819 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2024-08-21 02:39:31,340 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5060150.0, ans=0.125 2024-08-21 02:39:45,172 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2250, loss[loss=0.107, beats_loss=0.008744, ecapa_loss=0.0001695, whisper_loss=0.09659, over 19850.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001351, whisper_loss=0.08981, over 3765800.90 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:40:04,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5060350.0, ans=0.0 2024-08-21 02:40:09,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5060350.0, ans=0.015 2024-08-21 02:40:36,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-21 02:40:39,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.265e+01 2.538e+01 2.956e+01 4.238e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 02:40:42,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5060550.0, ans=0.1 2024-08-21 02:40:49,130 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 02:41:02,229 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5060650.0, ans=0.0 2024-08-21 02:41:05,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5060650.0, ans=0.125 2024-08-21 02:41:14,834 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2300, loss[loss=0.0712, beats_loss=0.009687, ecapa_loss=0.0001192, whisper_loss=0.06032, over 18107.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.000136, whisper_loss=0.08938, over 3746303.36 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:41:33,353 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5060850.0, ans=0.125 2024-08-21 02:41:46,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5060850.0, ans=0.0 2024-08-21 02:41:58,951 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 02:42:17,962 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 02:42:35,297 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-21 02:42:48,627 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2350, loss[loss=0.08797, beats_loss=0.009739, ecapa_loss=0.000145, whisper_loss=0.07678, over 13141.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001369, whisper_loss=0.08988, over 3775886.00 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:42:48,869 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 02:43:30,771 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5061450.0, ans=0.125 2024-08-21 02:43:34,251 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.263e-01 2024-08-21 02:43:36,105 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5061450.0, ans=0.125 2024-08-21 02:43:37,304 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-21 02:43:42,302 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 02:43:50,626 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.279e+01 2.504e+01 2.806e+01 9.902e+01, threshold=5.007e+01, percent-clipped=2.0 2024-08-21 02:44:22,552 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 28 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-21 02:44:31,001 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2400, loss[loss=0.1025, beats_loss=0.008755, ecapa_loss=0.00017, whisper_loss=0.09206, over 19023.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001371, whisper_loss=0.08947, over 3759972.65 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:44:36,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5061750.0, ans=0.125 2024-08-21 02:44:50,509 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5061850.0, ans=0.0 2024-08-21 02:45:06,338 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5061850.0, ans=0.2 2024-08-21 02:45:06,648 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2024-08-21 02:45:07,340 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 02:45:10,567 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5061850.0, ans=0.0 2024-08-21 02:45:30,451 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 02:45:32,722 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 26 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 02:45:34,655 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-21 02:45:36,847 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 02:45:38,951 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-21 02:45:40,265 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5062050.0, ans=0.125 2024-08-21 02:46:02,883 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-21 02:46:12,990 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5062150.0, ans=0.0 2024-08-21 02:46:23,532 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2450, loss[loss=0.08757, beats_loss=0.01105, ecapa_loss=0.0001326, whisper_loss=0.07519, over 18987.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.000137, whisper_loss=0.08974, over 3774465.86 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:47:34,862 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.316e+01 2.526e+01 2.772e+01 3.117e+02, threshold=5.053e+01, percent-clipped=1.0 2024-08-21 02:47:54,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5062550.0, ans=0.125 2024-08-21 02:48:03,966 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5062650.0, ans=0.1 2024-08-21 02:48:24,286 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2500, loss[loss=0.1101, beats_loss=0.007283, ecapa_loss=0.0001292, whisper_loss=0.1015, over 16311.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01031, ecapa_loss=0.0001367, whisper_loss=0.08965, over 3789374.61 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:48:26,190 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5062750.0, ans=0.0 2024-08-21 02:48:29,212 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5062750.0, ans=0.125 2024-08-21 02:48:29,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-21 02:48:31,029 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5062750.0, ans=0.0 2024-08-21 02:48:31,449 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=15.0 2024-08-21 02:48:38,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5062750.0, ans=0.125 2024-08-21 02:48:42,528 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 02:48:51,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5062850.0, ans=0.09899494936611666 2024-08-21 02:49:17,097 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 02:49:26,920 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 02:49:44,615 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 30 from Vox, 19 fro AS 2024-08-21 02:49:52,804 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 02:49:56,916 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 02:50:13,083 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2550, loss[loss=0.09848, beats_loss=0.009697, ecapa_loss=0.0001587, whisper_loss=0.0872, over 17575.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001368, whisper_loss=0.08909, over 3793678.95 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:50:30,769 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5063250.0, ans=0.125 2024-08-21 02:50:47,579 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5063350.0, ans=0.125 2024-08-21 02:51:02,696 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 02:51:20,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.281e+01 2.497e+01 2.831e+01 4.835e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-21 02:51:26,715 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-21 02:51:41,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5063550.0, ans=0.0 2024-08-21 02:52:01,898 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-21 02:52:09,134 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2600, loss[loss=0.1266, beats_loss=0.007329, ecapa_loss=0.0001508, whisper_loss=0.1178, over 24072.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01027, ecapa_loss=0.0001383, whisper_loss=0.09015, over 3831133.57 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:52:23,603 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 02:52:30,595 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 02:52:36,854 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 02:53:32,414 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-21 02:53:42,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5064050.0, ans=0.125 2024-08-21 02:54:21,844 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2650, loss[loss=0.09574, beats_loss=0.007849, ecapa_loss=0.0001629, whisper_loss=0.08626, over 21517.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01025, ecapa_loss=0.0001387, whisper_loss=0.09072, over 3835908.89 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:54:23,182 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-08-21 02:54:23,282 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2024-08-21 02:54:34,185 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 02:54:40,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5064250.0, ans=0.125 2024-08-21 02:55:05,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5064350.0, ans=0.0 2024-08-21 02:55:06,131 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 02:55:13,680 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5064450.0, ans=0.125 2024-08-21 02:55:38,071 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5064450.0, ans=0.0 2024-08-21 02:55:41,674 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.306e+01 2.544e+01 2.939e+01 3.967e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-21 02:55:48,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5064550.0, ans=0.125 2024-08-21 02:56:09,210 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 19 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-21 02:56:32,248 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2700, loss[loss=0.1152, beats_loss=0.01029, ecapa_loss=0.0001555, whisper_loss=0.1034, over 19839.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01021, ecapa_loss=0.0001385, whisper_loss=0.09102, over 3833598.83 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:57:00,701 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 02:58:04,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5065050.0, ans=0.0 2024-08-21 02:58:04,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5065050.0, ans=0.125 2024-08-21 02:58:23,745 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 02:58:31,645 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 02:58:39,598 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 27 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-21 02:58:42,136 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2750, loss[loss=0.1253, beats_loss=0.009654, ecapa_loss=0.0001304, whisper_loss=0.1143, over 16695.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.09052, over 3828672.47 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:58:58,347 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 02:59:08,526 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 02:59:59,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.389e+01 2.549e+01 2.769e+01 6.929e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-21 03:00:09,691 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 03:00:22,303 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-21 03:00:47,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5065750.0, ans=0.2 2024-08-21 03:00:48,655 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2800, loss[loss=0.1025, beats_loss=0.01094, ecapa_loss=0.0001363, whisper_loss=0.09017, over 21603.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01026, ecapa_loss=0.000138, whisper_loss=0.0904, over 3817927.95 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:00:50,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5065750.0, ans=0.1 2024-08-21 03:01:05,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5065750.0, ans=0.125 2024-08-21 03:01:21,727 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5065850.0, ans=0.2 2024-08-21 03:01:42,424 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-21 03:02:39,136 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.64 vs. limit=15.0 2024-08-21 03:02:56,665 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2850, loss[loss=0.1162, beats_loss=0.009351, ecapa_loss=0.0001793, whisper_loss=0.105, over 21941.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01024, ecapa_loss=0.0001385, whisper_loss=0.09028, over 3823012.56 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:03:33,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2024-08-21 03:03:38,792 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-21 03:03:48,566 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5066450.0, ans=0.07 2024-08-21 03:03:58,147 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5066450.0, ans=0.2 2024-08-21 03:04:14,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.291e+01 2.516e+01 2.868e+01 4.695e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 03:04:26,260 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5066550.0, ans=0.025 2024-08-21 03:04:30,646 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-21 03:04:34,685 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5066550.0, ans=0.95 2024-08-21 03:04:49,600 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-21 03:05:04,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5066650.0, ans=0.0 2024-08-21 03:05:07,352 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2900, loss[loss=0.1213, beats_loss=0.00813, ecapa_loss=0.0001529, whisper_loss=0.1116, over 22876.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01018, ecapa_loss=0.0001384, whisper_loss=0.09081, over 3822213.26 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:05:21,332 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 03:05:25,463 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 03:05:42,005 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 03:05:55,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5066950.0, ans=0.0 2024-08-21 03:06:05,792 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-21 03:06:11,972 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:06:26,002 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-21 03:07:10,186 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 2950, loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0001063, whisper_loss=0.09373, over 18235.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01022, ecapa_loss=0.0001386, whisper_loss=0.09044, over 3826994.47 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:07:45,032 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-21 03:07:50,497 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 03:07:51,933 WARNING [optim.py:496] (1/4) Scaling gradients by 0.03678512200713158, model_norm_threshold=50.32452392578125 2024-08-21 03:07:52,091 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.620e+05, grad_sumsq=7.962e+04, orig_rms_sq=3.290e+00 2024-08-21 03:08:15,370 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 03:08:19,011 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5067550.0, ans=0.0 2024-08-21 03:08:19,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.280e+01 2.501e+01 2.875e+01 1.368e+03, threshold=5.003e+01, percent-clipped=1.0 2024-08-21 03:08:33,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5067550.0, ans=0.2 2024-08-21 03:08:46,052 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 03:08:53,155 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5067650.0, ans=0.125 2024-08-21 03:08:55,980 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 03:09:02,339 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3000, loss[loss=0.0987, beats_loss=0.01021, ecapa_loss=0.0001212, whisper_loss=0.08728, over 17113.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0102, ecapa_loss=0.0001387, whisper_loss=0.09091, over 3806774.24 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:09:02,339 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-21 03:09:39,404 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2496, over 931116.00 frames. 2024-08-21 03:10:01,690 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on SV_voxceleb1: loss=0.003899, beats_loss=0, ecapa_loss=0.0003899, whisper_loss=0, over 944235.00 frames. 2024-08-21 03:11:41,892 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 03:11:41,896 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-21 03:12:02,452 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5067850.0, ans=0.125 2024-08-21 03:12:04,934 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5067850.0, ans=0.1 2024-08-21 03:12:10,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2024-08-21 03:12:18,417 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5067950.0, ans=0.0 2024-08-21 03:12:21,211 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-21 03:12:37,376 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5068050.0, ans=0.1 2024-08-21 03:12:40,938 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5068050.0, ans=0.0 2024-08-21 03:13:12,662 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3050, loss[loss=0.1161, beats_loss=0.009877, ecapa_loss=0.0001513, whisper_loss=0.1047, over 21688.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01019, ecapa_loss=0.0001391, whisper_loss=0.09045, over 3812058.47 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:13:26,241 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 03:13:28,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5068250.0, ans=0.125 2024-08-21 03:13:34,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5068350.0, ans=0.2 2024-08-21 03:13:36,401 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5068350.0, ans=0.0 2024-08-21 03:13:50,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5068450.0, ans=0.125 2024-08-21 03:14:05,068 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 03:14:08,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.251e+01 2.540e+01 2.788e+01 3.733e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-21 03:14:10,625 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-21 03:14:25,281 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 03:14:27,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-08-21 03:14:31,020 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-21 03:14:44,373 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3100, loss[loss=0.1024, beats_loss=0.009992, ecapa_loss=0.0001635, whisper_loss=0.09078, over 19306.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0102, ecapa_loss=0.0001388, whisper_loss=0.09026, over 3829058.60 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:14:54,954 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2024-08-21 03:14:55,650 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-21 03:14:56,065 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5068750.0, ans=0.125 2024-08-21 03:15:10,618 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5068850.0, ans=0.0 2024-08-21 03:15:21,543 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5068950.0, ans=0.0 2024-08-21 03:15:34,736 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5068950.0, ans=0.0 2024-08-21 03:15:50,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.66 vs. limit=10.0 2024-08-21 03:15:55,687 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5069050.0, ans=0.2 2024-08-21 03:16:17,550 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3150, loss[loss=0.08541, beats_loss=0.01316, ecapa_loss=0.0001338, whisper_loss=0.07091, over 19693.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.0001388, whisper_loss=0.09094, over 3853676.87 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:16:23,631 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5069250.0, ans=0.1 2024-08-21 03:16:35,711 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 21 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-21 03:16:46,223 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 34 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-21 03:16:57,330 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-21 03:17:02,912 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 03:17:11,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5069550.0, ans=0.0 2024-08-21 03:17:12,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.426e+01 2.655e+01 2.939e+01 1.391e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-21 03:17:15,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5069550.0, ans=0.125 2024-08-21 03:17:48,326 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3200, loss[loss=0.1053, beats_loss=0.009527, ecapa_loss=0.0001456, whisper_loss=0.09428, over 19135.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.08989, over 3814360.41 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:17:56,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5069750.0, ans=0.0 2024-08-21 03:17:56,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5069750.0, ans=0.0 2024-08-21 03:18:08,665 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 03:18:14,228 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5069850.0, ans=0.0 2024-08-21 03:18:18,774 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 14 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-21 03:18:29,332 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2024-08-21 03:18:30,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5069950.0, ans=0.1 2024-08-21 03:18:49,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5070050.0, ans=0.125 2024-08-21 03:18:51,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5070050.0, ans=0.125 2024-08-21 03:18:54,942 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5070050.0, ans=0.07 2024-08-21 03:18:58,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5070050.0, ans=0.125 2024-08-21 03:19:19,346 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3250, loss[loss=0.08863, beats_loss=0.009656, ecapa_loss=0.0001575, whisper_loss=0.0774, over 15579.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001401, whisper_loss=0.09057, over 3799222.51 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:19:29,663 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5070250.0, ans=0.125 2024-08-21 03:19:31,901 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-08-21 03:19:56,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5070350.0, ans=0.07 2024-08-21 03:20:11,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5070450.0, ans=0.0 2024-08-21 03:20:24,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.311e+01 2.569e+01 2.814e+01 1.085e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-21 03:20:32,420 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5070550.0, ans=0.125 2024-08-21 03:20:41,214 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5070550.0, ans=0.2 2024-08-21 03:21:00,954 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5070650.0, ans=0.125 2024-08-21 03:21:06,341 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3300, loss[loss=0.1083, beats_loss=0.008432, ecapa_loss=0.0001596, whisper_loss=0.09825, over 22427.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0103, ecapa_loss=0.0001408, whisper_loss=0.09115, over 3829786.59 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:21:10,988 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5070750.0, ans=0.09899494936611666 2024-08-21 03:21:38,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5070850.0, ans=0.125 2024-08-21 03:21:42,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5070850.0, ans=0.1 2024-08-21 03:22:21,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-21 03:22:45,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5071150.0, ans=0.125 2024-08-21 03:22:58,848 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3350, loss[loss=0.09336, beats_loss=0.01172, ecapa_loss=0.000125, whisper_loss=0.08039, over 15770.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001403, whisper_loss=0.09078, over 3851140.64 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:23:08,250 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 03:23:08,550 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5071250.0, ans=0.95 2024-08-21 03:23:28,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5071350.0, ans=0.1 2024-08-21 03:23:29,990 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 03:23:40,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5071350.0, ans=0.1 2024-08-21 03:23:45,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2024-08-21 03:24:09,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.266e+01 2.448e+01 2.718e+01 4.054e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-21 03:24:15,320 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-21 03:24:26,074 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5071550.0, ans=0.125 2024-08-21 03:24:33,817 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5071650.0, ans=0.2 2024-08-21 03:24:43,275 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5071650.0, ans=0.125 2024-08-21 03:24:43,629 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-21 03:24:44,344 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 03:24:44,473 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5071650.0, ans=0.125 2024-08-21 03:24:45,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5071650.0, ans=0.1 2024-08-21 03:24:56,844 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3400, loss[loss=0.09648, beats_loss=0.01045, ecapa_loss=0.0001656, whisper_loss=0.08438, over 21777.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001414, whisper_loss=0.09052, over 3814128.80 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:25:39,866 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 03:25:49,804 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 03:26:31,387 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 03:26:36,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5072150.0, ans=0.0 2024-08-21 03:26:39,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5072150.0, ans=0.0 2024-08-21 03:26:52,080 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 03:26:57,262 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3450, loss[loss=0.08137, beats_loss=0.01037, ecapa_loss=0.0001493, whisper_loss=0.06951, over 18037.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09053, over 3811075.64 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:27:10,034 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 03:27:42,068 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5072350.0, ans=0.0 2024-08-21 03:27:45,203 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5072450.0, ans=0.125 2024-08-21 03:28:09,577 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.313e+01 2.518e+01 2.811e+01 5.199e+01, threshold=5.037e+01, percent-clipped=1.0 2024-08-21 03:28:31,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5072650.0, ans=0.1 2024-08-21 03:28:43,858 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 03:28:52,412 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3500, loss[loss=0.1027, beats_loss=0.01234, ecapa_loss=0.000137, whisper_loss=0.08894, over 22330.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001405, whisper_loss=0.09009, over 3825296.73 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:28:58,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5072750.0, ans=0.125 2024-08-21 03:29:00,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5072750.0, ans=0.125 2024-08-21 03:29:07,220 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=12.0 2024-08-21 03:29:14,546 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.93 vs. limit=22.5 2024-08-21 03:29:17,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5072850.0, ans=0.0 2024-08-21 03:29:22,975 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-21 03:29:34,228 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 12 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 03:29:42,356 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:30:29,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5073150.0, ans=0.0 2024-08-21 03:30:40,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5073150.0, ans=0.125 2024-08-21 03:30:44,515 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3550, loss[loss=0.1003, beats_loss=0.009922, ecapa_loss=0.000132, whisper_loss=0.08907, over 19176.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.0001402, whisper_loss=0.08968, over 3819246.94 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:30:50,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5073250.0, ans=0.035 2024-08-21 03:31:02,816 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5073250.0, ans=0.04949747468305833 2024-08-21 03:31:12,349 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-21 03:31:52,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.266e+01 2.535e+01 2.803e+01 1.045e+02, threshold=5.070e+01, percent-clipped=1.0 2024-08-21 03:32:06,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5073550.0, ans=0.125 2024-08-21 03:32:35,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5073650.0, ans=0.0 2024-08-21 03:32:35,500 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5073650.0, ans=0.0 2024-08-21 03:32:38,818 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3600, loss[loss=0.08301, beats_loss=0.01261, ecapa_loss=0.0001194, whisper_loss=0.0692, over 22307.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.0001392, whisper_loss=0.08856, over 3807992.12 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:33:02,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5073850.0, ans=0.2 2024-08-21 03:33:14,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5073850.0, ans=0.0 2024-08-21 03:33:26,220 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5073950.0, ans=0.125 2024-08-21 03:33:46,923 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5073950.0, ans=0.125 2024-08-21 03:33:52,312 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 03:34:35,476 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3650, loss[loss=0.1272, beats_loss=0.008023, ecapa_loss=0.0001433, whisper_loss=0.1178, over 19805.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.0001387, whisper_loss=0.08969, over 3809292.01 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:34:56,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5074250.0, ans=0.125 2024-08-21 03:35:04,666 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 03:35:48,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.270e+01 2.492e+01 2.659e+01 4.040e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 03:35:58,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5074550.0, ans=0.125 2024-08-21 03:36:11,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5074650.0, ans=0.125 2024-08-21 03:36:21,487 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-21 03:36:28,202 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5074650.0, ans=0.125 2024-08-21 03:36:34,368 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3700, loss[loss=0.1178, beats_loss=0.01049, ecapa_loss=0.0001388, whisper_loss=0.1059, over 21277.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.08981, over 3784849.83 frames. ], batch size: 84, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:36:53,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-21 03:36:54,777 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 03:37:05,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5074850.0, ans=0.125 2024-08-21 03:37:11,117 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 03:37:14,858 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-21 03:37:27,956 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 03:37:42,520 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-08-21 03:38:34,074 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3750, loss[loss=0.07, beats_loss=0.01338, ecapa_loss=0.00015, whisper_loss=0.05512, over 13538.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.09001, over 3740504.61 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:39:10,196 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5075350.0, ans=0.2 2024-08-21 03:39:27,551 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5075450.0, ans=0.0 2024-08-21 03:39:39,855 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2024-08-21 03:39:50,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.197e+01 2.452e+01 2.774e+01 3.553e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-21 03:40:24,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-21 03:40:35,819 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3800, loss[loss=0.09524, beats_loss=0.009576, ecapa_loss=0.0001441, whisper_loss=0.08422, over 16851.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001383, whisper_loss=0.09013, over 3745990.51 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:40:36,062 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 22 from LS+wenet, 10 from Vox, 19 fro AS 2024-08-21 03:40:43,426 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 03:41:04,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5075850.0, ans=0.0 2024-08-21 03:41:07,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5075850.0, ans=0.0 2024-08-21 03:41:32,499 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5075950.0, ans=0.0 2024-08-21 03:42:08,219 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.914e+05 2024-08-21 03:42:38,122 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3850, loss[loss=0.09457, beats_loss=0.01034, ecapa_loss=0.0001576, whisper_loss=0.08265, over 21340.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01021, ecapa_loss=0.0001391, whisper_loss=0.09073, over 3772120.19 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:42:46,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5076250.0, ans=0.035 2024-08-21 03:42:52,490 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2024-08-21 03:42:54,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-21 03:43:03,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5076350.0, ans=0.0 2024-08-21 03:43:50,302 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5076550.0, ans=0.1 2024-08-21 03:43:50,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.237e+01 2.453e+01 2.696e+01 3.570e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-21 03:43:55,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5076550.0, ans=0.04949747468305833 2024-08-21 03:43:59,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5076550.0, ans=0.125 2024-08-21 03:44:20,040 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-08-21 03:44:30,826 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-21 03:44:32,663 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-21 03:44:38,208 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3900, loss[loss=0.07825, beats_loss=0.0119, ecapa_loss=0.0001369, whisper_loss=0.06498, over 19886.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0103, ecapa_loss=0.0001395, whisper_loss=0.08965, over 3770782.79 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:44:41,861 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:45:15,208 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 21 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-21 03:45:41,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5076950.0, ans=0.0 2024-08-21 03:45:50,824 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=12.0 2024-08-21 03:46:11,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5077150.0, ans=0.1 2024-08-21 03:46:28,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-21 03:46:39,105 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 3950, loss[loss=0.1115, beats_loss=0.009548, ecapa_loss=0.0001184, whisper_loss=0.1007, over 16784.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001398, whisper_loss=0.08943, over 3803897.89 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:46:40,772 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5077250.0, ans=0.0 2024-08-21 03:46:53,675 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-21 03:47:07,689 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-21 03:47:44,200 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5077450.0, ans=0.0 2024-08-21 03:47:48,787 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-21 03:47:51,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.265e+01 2.547e+01 2.985e+01 6.857e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-21 03:48:05,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5077550.0, ans=0.025 2024-08-21 03:48:15,952 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 25 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-21 03:48:32,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5077650.0, ans=0.0 2024-08-21 03:48:36,692 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 03:48:39,324 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4000, loss[loss=0.1058, beats_loss=0.00954, ecapa_loss=0.000175, whisper_loss=0.09455, over 21970.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.08916, over 3849553.92 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:48:48,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5077750.0, ans=0.1 2024-08-21 03:48:56,261 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-21 03:49:07,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5077850.0, ans=0.125 2024-08-21 03:49:08,172 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2024-08-21 03:49:12,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5077850.0, ans=0.1 2024-08-21 03:49:20,551 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.13 vs. limit=10.0 2024-08-21 03:49:34,867 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 03:49:42,357 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 03:50:42,824 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5078150.0, ans=0.125 2024-08-21 03:50:48,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=15.0 2024-08-21 03:50:48,815 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4050, loss[loss=0.0909, beats_loss=0.01104, ecapa_loss=0.0001233, whisper_loss=0.07863, over 14029.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.08938, over 3821660.87 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:51:15,900 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-21 03:51:21,799 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 03:51:27,420 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 17 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-21 03:51:32,615 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 03:52:08,572 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 03:52:09,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5078550.0, ans=0.0 2024-08-21 03:52:10,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.320e+01 2.567e+01 2.893e+01 7.952e+01, threshold=5.134e+01, percent-clipped=3.0 2024-08-21 03:52:43,653 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 03:52:57,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-21 03:52:58,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5078650.0, ans=0.125 2024-08-21 03:53:01,430 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4100, loss[loss=0.07853, beats_loss=0.009914, ecapa_loss=0.0001705, whisper_loss=0.06691, over 15417.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.08881, over 3839317.19 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:53:09,449 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 03:53:16,243 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0443761944770813, model_norm_threshold=51.335693359375 2024-08-21 03:53:16,401 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.724e+05, grad_sumsq=2.524e+07, orig_rms_sq=1.079e-02 2024-08-21 03:53:20,620 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-21 03:53:21,382 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 03:53:27,832 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5078850.0, ans=0.2 2024-08-21 03:53:34,565 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 03:53:41,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5078850.0, ans=0.2 2024-08-21 03:53:57,134 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5078950.0, ans=0.0 2024-08-21 03:53:58,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5078950.0, ans=0.015 2024-08-21 03:54:21,747 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 03:54:23,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-08-21 03:54:41,168 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5079050.0, ans=0.0 2024-08-21 03:54:46,776 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5079150.0, ans=0.1 2024-08-21 03:55:04,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5079150.0, ans=0.125 2024-08-21 03:55:10,837 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4150, loss[loss=0.1131, beats_loss=0.01068, ecapa_loss=0.0001472, whisper_loss=0.101, over 21398.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.08939, over 3858305.00 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:56:01,299 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5079450.0, ans=0.0 2024-08-21 03:56:03,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5079450.0, ans=0.125 2024-08-21 03:56:29,505 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 03:56:31,588 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5079550.0, ans=0.0 2024-08-21 03:56:31,818 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2024-08-21 03:56:32,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.340e+01 2.592e+01 2.879e+01 1.157e+03, threshold=5.184e+01, percent-clipped=4.0 2024-08-21 03:56:35,661 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-21 03:56:35,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5079550.0, ans=0.2 2024-08-21 03:56:38,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5079550.0, ans=0.125 2024-08-21 03:56:39,246 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-21 03:56:43,959 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2024-08-21 03:57:04,456 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5079650.0, ans=0.2 2024-08-21 03:57:08,000 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.22 vs. limit=22.5 2024-08-21 03:57:15,916 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-21 03:57:17,866 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4200, loss[loss=0.109, beats_loss=0.01032, ecapa_loss=0.0001432, whisper_loss=0.09729, over 14514.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001381, whisper_loss=0.08975, over 3843446.15 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:57:19,389 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-21 03:57:23,323 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-21 03:57:25,902 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-21 03:57:40,442 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2024-08-21 03:58:14,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5079950.0, ans=0.125 2024-08-21 03:58:19,630 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-21 03:59:04,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=12.0 2024-08-21 03:59:06,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5080150.0, ans=0.0 2024-08-21 03:59:20,459 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4250, loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001341, whisper_loss=0.09045, over 12938.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001392, whisper_loss=0.08912, over 3804755.64 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:59:30,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5080250.0, ans=0.125 2024-08-21 03:59:32,843 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5080250.0, ans=0.0 2024-08-21 03:59:59,256 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 04:00:36,046 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-21 04:00:39,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5080550.0, ans=0.125 2024-08-21 04:00:40,591 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.302e+01 2.518e+01 2.832e+01 1.053e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-21 04:01:27,098 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 04:01:29,534 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4300, loss[loss=0.09382, beats_loss=0.01121, ecapa_loss=0.0001274, whisper_loss=0.08133, over 20150.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001378, whisper_loss=0.08944, over 3835542.05 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:02:39,952 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2024-08-21 04:02:57,867 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5081050.0, ans=0.125 2024-08-21 04:03:04,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5081050.0, ans=0.0 2024-08-21 04:03:30,183 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4350, loss[loss=0.119, beats_loss=0.01008, ecapa_loss=0.0001275, whisper_loss=0.1077, over 23514.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001383, whisper_loss=0.08982, over 3830509.64 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:03:34,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5081250.0, ans=0.0 2024-08-21 04:03:59,099 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5081350.0, ans=0.125 2024-08-21 04:04:03,310 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2024-08-21 04:04:09,097 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5081350.0, ans=0.0 2024-08-21 04:04:20,681 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 12 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-21 04:04:28,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5081450.0, ans=0.0 2024-08-21 04:04:37,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.205e+01 2.430e+01 2.775e+01 4.634e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 04:04:41,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5081550.0, ans=0.1 2024-08-21 04:05:16,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5081650.0, ans=0.1 2024-08-21 04:05:19,890 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4400, loss[loss=0.1111, beats_loss=0.01036, ecapa_loss=0.0001076, whisper_loss=0.09971, over 21921.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001376, whisper_loss=0.08904, over 3786919.85 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:05:21,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5081750.0, ans=0.0 2024-08-21 04:05:39,512 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 04:05:45,889 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 04:05:49,963 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5081850.0, ans=0.1 2024-08-21 04:05:50,963 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 04:05:56,729 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 04:06:21,053 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 04:06:29,010 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5081950.0, ans=0.125 2024-08-21 04:06:29,445 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2024-08-21 04:07:06,692 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5082150.0, ans=0.2 2024-08-21 04:07:20,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5082150.0, ans=0.0 2024-08-21 04:07:31,850 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4450, loss[loss=0.09706, beats_loss=0.009344, ecapa_loss=0.0001209, whisper_loss=0.08651, over 14111.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001374, whisper_loss=0.08993, over 3781802.36 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:07:53,909 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 21 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-21 04:08:00,865 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5082350.0, ans=0.1 2024-08-21 04:08:43,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-08-21 04:08:51,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.213e+01 2.422e+01 2.731e+01 3.413e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-21 04:09:03,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5082550.0, ans=0.2 2024-08-21 04:09:42,526 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4500, loss[loss=0.1052, beats_loss=0.008382, ecapa_loss=0.0001538, whisper_loss=0.09532, over 16862.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001365, whisper_loss=0.09038, over 3781741.27 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:09:50,399 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2024-08-21 04:09:56,525 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 04:09:58,685 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 04:10:23,197 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 12 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 04:10:36,277 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-21 04:10:36,348 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.02 vs. limit=6.0 2024-08-21 04:10:52,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5082950.0, ans=0.1 2024-08-21 04:11:05,001 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 04:11:28,054 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 04:11:30,968 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 04:11:33,431 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 04:11:37,495 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5083150.0, ans=0.125 2024-08-21 04:11:47,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5083150.0, ans=0.07 2024-08-21 04:11:53,286 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4550, loss[loss=0.0876, beats_loss=0.01174, ecapa_loss=0.0001366, whisper_loss=0.07449, over 17597.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001371, whisper_loss=0.08981, over 3786351.61 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:12:36,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5083350.0, ans=0.125 2024-08-21 04:12:59,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-08-21 04:13:13,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.342e+01 2.629e+01 2.950e+01 5.025e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-21 04:13:15,050 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5083550.0, ans=0.0 2024-08-21 04:13:37,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5083650.0, ans=0.0 2024-08-21 04:13:40,341 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5083650.0, ans=0.1 2024-08-21 04:14:04,393 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4600, loss[loss=0.1205, beats_loss=0.008577, ecapa_loss=0.0001481, whisper_loss=0.1104, over 20401.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001368, whisper_loss=0.08966, over 3799336.81 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:14:06,370 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2024-08-21 04:14:35,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5083850.0, ans=0.125 2024-08-21 04:14:53,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5083950.0, ans=0.125 2024-08-21 04:14:57,344 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5083950.0, ans=0.125 2024-08-21 04:14:58,312 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 31 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-21 04:15:32,857 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 04:15:37,236 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5084050.0, ans=0.125 2024-08-21 04:15:37,817 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2024-08-21 04:15:48,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5084150.0, ans=0.125 2024-08-21 04:16:07,702 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4650, loss[loss=0.09066, beats_loss=0.01303, ecapa_loss=0.0001067, whisper_loss=0.07656, over 19732.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001388, whisper_loss=0.08977, over 3805289.82 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:16:22,110 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=12.0 2024-08-21 04:16:39,563 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 10 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-21 04:17:17,738 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 04:17:24,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5084550.0, ans=0.125 2024-08-21 04:17:27,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.290e+01 2.552e+01 2.874e+01 1.481e+02, threshold=5.104e+01, percent-clipped=2.0 2024-08-21 04:17:54,907 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 21 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 04:18:14,916 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4700, loss[loss=0.1041, beats_loss=0.009788, ecapa_loss=0.0001293, whisper_loss=0.09299, over 19720.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.0001384, whisper_loss=0.08876, over 3808866.55 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:18:18,761 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.669e+00 2024-08-21 04:18:27,282 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 04:18:37,881 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-21 04:19:18,900 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 04:19:39,569 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 04:19:44,769 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 04:19:46,763 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2024-08-21 04:19:50,224 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 04:19:52,351 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 04:20:25,050 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4750, loss[loss=0.1218, beats_loss=0.009027, ecapa_loss=0.0001274, whisper_loss=0.1115, over 18038.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08896, over 3787042.31 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:20:27,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5085250.0, ans=0.0 2024-08-21 04:20:32,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5085250.0, ans=0.0 2024-08-21 04:20:51,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5085350.0, ans=0.0 2024-08-21 04:20:57,619 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5085350.0, ans=0.1 2024-08-21 04:21:32,698 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 04:21:44,308 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.249e+01 2.433e+01 2.723e+01 6.483e+01, threshold=4.865e+01, percent-clipped=1.0 2024-08-21 04:21:46,932 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-21 04:22:01,602 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-21 04:22:11,839 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 04:22:33,054 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4800, loss[loss=0.1056, beats_loss=0.007617, ecapa_loss=0.0001774, whisper_loss=0.09621, over 15571.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01038, ecapa_loss=0.0001404, whisper_loss=0.08888, over 3748472.20 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:22:39,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5085750.0, ans=0.0 2024-08-21 04:23:38,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5085950.0, ans=0.125 2024-08-21 04:23:42,971 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5085950.0, ans=0.125 2024-08-21 04:23:57,793 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-21 04:23:59,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5086050.0, ans=0.1 2024-08-21 04:24:07,031 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5086050.0, ans=0.1 2024-08-21 04:24:13,081 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5086050.0, ans=0.125 2024-08-21 04:24:22,181 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 16 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 04:24:33,280 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-21 04:24:35,544 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5086150.0, ans=0.125 2024-08-21 04:24:39,285 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4850, loss[loss=0.09108, beats_loss=0.01035, ecapa_loss=0.0001177, whisper_loss=0.07955, over 20722.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0104, ecapa_loss=0.0001398, whisper_loss=0.08838, over 3772595.73 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:24:46,611 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=12.0 2024-08-21 04:24:56,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5086250.0, ans=0.1 2024-08-21 04:25:02,833 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 04:25:21,735 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 04:25:24,450 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 04:25:50,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5086450.0, ans=0.125 2024-08-21 04:25:56,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.255e+01 2.433e+01 2.647e+01 4.364e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-21 04:25:56,368 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-21 04:26:04,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5086550.0, ans=0.125 2024-08-21 04:26:07,597 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5086550.0, ans=0.125 2024-08-21 04:26:14,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5086550.0, ans=0.0 2024-08-21 04:26:18,690 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5086650.0, ans=0.125 2024-08-21 04:26:41,746 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:26:42,412 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4900, loss[loss=0.1003, beats_loss=0.008872, ecapa_loss=0.0001557, whisper_loss=0.08984, over 20568.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01036, ecapa_loss=0.0001399, whisper_loss=0.08825, over 3792315.32 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:27:02,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-08-21 04:27:15,189 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-21 04:28:08,314 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 04:28:15,225 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5087050.0, ans=0.0 2024-08-21 04:28:40,691 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 04:28:51,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5087250.0, ans=0.2 2024-08-21 04:28:52,726 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 4950, loss[loss=0.08035, beats_loss=0.01173, ecapa_loss=0.0001261, whisper_loss=0.06736, over 21379.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01043, ecapa_loss=0.0001385, whisper_loss=0.08872, over 3810296.71 frames. ], batch size: 83, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:29:03,237 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-21 04:29:05,800 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 23 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-21 04:29:09,059 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-21 04:29:10,332 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5087250.0, ans=0.1 2024-08-21 04:29:13,742 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2024-08-21 04:29:23,754 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5087350.0, ans=0.2 2024-08-21 04:29:41,118 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 04:30:14,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.296e+01 2.472e+01 2.859e+01 4.220e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 04:30:20,277 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 28 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-21 04:30:48,809 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 33 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 04:31:04,948 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5000, loss[loss=0.1135, beats_loss=0.008499, ecapa_loss=0.0001502, whisper_loss=0.1035, over 18981.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.08898, over 3815582.80 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:31:15,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5087750.0, ans=0.0 2024-08-21 04:31:55,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5087950.0, ans=0.1 2024-08-21 04:31:57,859 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.38 vs. limit=10.0 2024-08-21 04:32:09,227 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 17 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 04:32:20,391 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5088050.0, ans=0.2 2024-08-21 04:32:28,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5088050.0, ans=0.125 2024-08-21 04:32:38,124 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-21 04:33:08,079 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5050, loss[loss=0.1079, beats_loss=0.009085, ecapa_loss=0.0002019, whisper_loss=0.09677, over 16745.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.08904, over 3776116.26 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:33:09,974 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5088250.0, ans=0.125 2024-08-21 04:33:16,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5088250.0, ans=0.0 2024-08-21 04:33:32,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5088350.0, ans=0.2 2024-08-21 04:33:47,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5088350.0, ans=0.125 2024-08-21 04:34:06,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5088450.0, ans=0.015 2024-08-21 04:34:09,403 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5088450.0, ans=0.0 2024-08-21 04:34:12,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5088450.0, ans=0.125 2024-08-21 04:34:20,858 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 04:34:22,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5088550.0, ans=0.125 2024-08-21 04:34:24,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.200e+01 2.422e+01 2.716e+01 3.329e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-21 04:34:30,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5088550.0, ans=0.0 2024-08-21 04:35:10,903 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5100, loss[loss=0.09577, beats_loss=0.0117, ecapa_loss=0.0001209, whisper_loss=0.08286, over 18335.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001386, whisper_loss=0.08906, over 3795778.94 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:36:10,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5088950.0, ans=0.125 2024-08-21 04:36:31,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5089050.0, ans=0.2 2024-08-21 04:36:36,284 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5089050.0, ans=0.0 2024-08-21 04:37:04,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5089250.0, ans=0.0 2024-08-21 04:37:05,331 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5150, loss[loss=0.09072, beats_loss=0.01347, ecapa_loss=0.0001019, whisper_loss=0.07623, over 22504.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001385, whisper_loss=0.08961, over 3805612.67 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:37:16,490 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5089250.0, ans=0.0 2024-08-21 04:37:37,664 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5089350.0, ans=0.125 2024-08-21 04:37:57,651 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5089450.0, ans=0.125 2024-08-21 04:37:58,633 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 04:38:12,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.286e+01 2.550e+01 3.060e+01 1.523e+02, threshold=5.101e+01, percent-clipped=5.0 2024-08-21 04:38:29,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5089550.0, ans=0.0 2024-08-21 04:38:40,555 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 36 from Vox, 30 fro AS 2024-08-21 04:38:41,590 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-21 04:38:56,233 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5200, loss[loss=0.06941, beats_loss=0.01368, ecapa_loss=0.0001311, whisper_loss=0.05442, over 14971.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01061, ecapa_loss=0.0001373, whisper_loss=0.08898, over 3831981.03 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:38:56,471 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 04:39:01,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5089750.0, ans=0.05 2024-08-21 04:39:04,877 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-21 04:39:09,547 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 04:39:43,847 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 04:39:44,902 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-21 04:39:51,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5089950.0, ans=0.125 2024-08-21 04:40:15,395 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.95 vs. limit=5.0 2024-08-21 04:40:18,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5090050.0, ans=0.125 2024-08-21 04:40:47,199 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5250, loss[loss=0.101, beats_loss=0.009686, ecapa_loss=0.0001566, whisper_loss=0.08973, over 18950.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01059, ecapa_loss=0.0001383, whisper_loss=0.08925, over 3831785.46 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:40:51,182 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5090250.0, ans=0.125 2024-08-21 04:41:11,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5090350.0, ans=0.125 2024-08-21 04:41:11,520 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5090350.0, ans=0.0 2024-08-21 04:41:27,844 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5090350.0, ans=0.1 2024-08-21 04:41:36,718 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5090450.0, ans=0.125 2024-08-21 04:41:58,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.269e+01 2.486e+01 2.907e+01 3.986e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-21 04:41:59,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=5090550.0, ans=10.0 2024-08-21 04:42:17,484 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 04:42:21,936 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-21 04:42:29,405 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 21 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-21 04:42:32,457 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5090650.0, ans=0.125 2024-08-21 04:42:41,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5090750.0, ans=0.0 2024-08-21 04:42:42,839 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5300, loss[loss=0.0907, beats_loss=0.0108, ecapa_loss=0.0001378, whisper_loss=0.07852, over 14268.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001389, whisper_loss=0.08938, over 3806917.13 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:43:01,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-21 04:43:04,350 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5090850.0, ans=0.125 2024-08-21 04:43:26,530 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5090850.0, ans=0.125 2024-08-21 04:43:30,609 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 04:43:33,449 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-21 04:43:36,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5090950.0, ans=0.0 2024-08-21 04:44:24,914 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-21 04:44:31,602 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5091150.0, ans=0.0 2024-08-21 04:44:37,866 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 04:44:42,685 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5350, loss[loss=0.07352, beats_loss=0.01231, ecapa_loss=0.0001366, whisper_loss=0.05984, over 20964.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001397, whisper_loss=0.0889, over 3783714.65 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:45:01,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5091250.0, ans=0.0 2024-08-21 04:45:09,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5091350.0, ans=0.125 2024-08-21 04:45:15,202 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 04:45:39,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5091450.0, ans=0.125 2024-08-21 04:45:47,564 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5091450.0, ans=0.2 2024-08-21 04:46:00,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.175e+01 2.396e+01 2.648e+01 3.168e+01, threshold=4.792e+01, percent-clipped=0.0 2024-08-21 04:46:48,408 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5400, loss[loss=0.05428, beats_loss=0.0129, ecapa_loss=0.0001301, whisper_loss=0.04008, over 13076.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.08899, over 3801030.38 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:47:02,887 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-21 04:47:27,827 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 04:47:32,673 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 04:47:44,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5091950.0, ans=0.125 2024-08-21 04:47:54,560 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5091950.0, ans=0.0 2024-08-21 04:48:08,158 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5092050.0, ans=0.125 2024-08-21 04:48:34,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5092150.0, ans=0.125 2024-08-21 04:48:40,056 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 15 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 04:48:48,995 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5092150.0, ans=0.125 2024-08-21 04:48:57,113 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5450, loss[loss=0.1248, beats_loss=0.008112, ecapa_loss=0.0001306, whisper_loss=0.1154, over 16655.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001388, whisper_loss=0.08921, over 3764287.71 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:49:28,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5092350.0, ans=0.125 2024-08-21 04:49:34,636 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 04:50:15,366 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5092550.0, ans=0.2 2024-08-21 04:50:18,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.233e+01 2.525e+01 2.938e+01 2.405e+02, threshold=5.050e+01, percent-clipped=4.0 2024-08-21 04:50:23,120 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:50:32,640 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 04:50:36,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5092550.0, ans=0.1 2024-08-21 04:50:54,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5092650.0, ans=0.125 2024-08-21 04:51:09,328 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5500, loss[loss=0.09414, beats_loss=0.0101, ecapa_loss=0.0001247, whisper_loss=0.08279, over 21619.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.0001373, whisper_loss=0.08904, over 3758764.35 frames. ], batch size: 83, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:51:13,064 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5092750.0, ans=0.0 2024-08-21 04:52:02,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5092950.0, ans=0.125 2024-08-21 04:52:12,079 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 04:52:24,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.04 vs. limit=22.5 2024-08-21 04:52:41,776 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 04:52:54,653 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5093150.0, ans=0.125 2024-08-21 04:52:57,281 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5093150.0, ans=0.125 2024-08-21 04:53:14,709 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5093150.0, ans=0.125 2024-08-21 04:53:20,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5093250.0, ans=0.1 2024-08-21 04:53:20,890 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5550, loss[loss=0.08783, beats_loss=0.01118, ecapa_loss=0.0001469, whisper_loss=0.07517, over 17448.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001378, whisper_loss=0.0898, over 3807088.86 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:53:29,044 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2024-08-21 04:54:48,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.248e+01 2.482e+01 2.824e+01 3.933e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-21 04:54:52,926 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-21 04:54:59,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5093550.0, ans=0.125 2024-08-21 04:55:14,019 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5093650.0, ans=0.125 2024-08-21 04:55:18,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5093650.0, ans=0.2 2024-08-21 04:55:22,995 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 04:55:33,847 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5600, loss[loss=0.1308, beats_loss=0.0099, ecapa_loss=0.0001501, whisper_loss=0.1194, over 23595.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01027, ecapa_loss=0.0001395, whisper_loss=0.09044, over 3826825.85 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:56:04,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5093850.0, ans=0.5 2024-08-21 04:56:14,117 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 30 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-21 04:56:44,797 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5093950.0, ans=0.0 2024-08-21 04:56:58,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5094050.0, ans=0.125 2024-08-21 04:57:05,385 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5094050.0, ans=0.1 2024-08-21 04:57:05,436 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.246e+00 2024-08-21 04:57:18,129 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 04:57:20,874 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-21 04:57:28,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5094150.0, ans=0.1 2024-08-21 04:57:35,945 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5650, loss[loss=0.09829, beats_loss=0.01135, ecapa_loss=0.0001428, whisper_loss=0.08551, over 22531.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01028, ecapa_loss=0.0001397, whisper_loss=0.09051, over 3849783.24 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:57:47,059 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5094250.0, ans=0.0 2024-08-21 04:57:50,997 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 04:58:08,833 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5094350.0, ans=0.125 2024-08-21 04:58:49,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.209e+01 2.456e+01 2.705e+01 6.075e+01, threshold=4.911e+01, percent-clipped=1.0 2024-08-21 04:58:59,715 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2024-08-21 04:59:34,812 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5700, loss[loss=0.1, beats_loss=0.01079, ecapa_loss=0.0001599, whisper_loss=0.08766, over 22690.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.09039, over 3867701.08 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:59:36,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5094750.0, ans=0.1 2024-08-21 04:59:39,581 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 04:59:43,585 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-21 04:59:48,461 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 05:00:09,900 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5094850.0, ans=0.125 2024-08-21 05:00:15,036 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 05:00:22,944 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-21 05:01:02,388 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=5095150.0, ans=0.2 2024-08-21 05:01:08,865 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 05:01:15,958 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=15.0 2024-08-21 05:01:25,335 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5750, loss[loss=0.1082, beats_loss=0.01165, ecapa_loss=9.502e-05, whisper_loss=0.09558, over 18309.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001385, whisper_loss=0.09018, over 3843804.70 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:01:34,438 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 33 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-21 05:01:50,905 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 26 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-21 05:02:13,094 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 28 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-21 05:02:32,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5095550.0, ans=0.125 2024-08-21 05:02:38,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5095550.0, ans=0.125 2024-08-21 05:02:38,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.497e+01 2.736e+01 4.299e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-21 05:02:56,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5095650.0, ans=0.0 2024-08-21 05:03:17,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5095650.0, ans=0.0 2024-08-21 05:03:20,580 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5800, loss[loss=0.09786, beats_loss=0.01033, ecapa_loss=0.0001506, whisper_loss=0.08602, over 20748.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.000139, whisper_loss=0.09032, over 3865643.62 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:03:42,583 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5095850.0, ans=0.0 2024-08-21 05:03:49,431 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5095850.0, ans=0.0 2024-08-21 05:03:57,398 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-21 05:03:58,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5095850.0, ans=0.125 2024-08-21 05:04:20,723 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 05:04:32,921 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 05:05:04,140 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-21 05:05:05,935 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 05:05:11,382 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5850, loss[loss=0.1074, beats_loss=0.009519, ecapa_loss=0.0001073, whisper_loss=0.09683, over 17760.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.09059, over 3855804.12 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:05:13,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5096250.0, ans=0.125 2024-08-21 05:05:38,466 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 05:05:41,304 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5096350.0, ans=0.0 2024-08-21 05:06:12,047 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5096550.0, ans=0.0 2024-08-21 05:06:14,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.298e+01 2.570e+01 2.824e+01 3.912e+01, threshold=5.140e+01, percent-clipped=0.0 2024-08-21 05:06:26,676 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-21 05:06:27,093 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5096550.0, ans=0.125 2024-08-21 05:06:27,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5096550.0, ans=0.0 2024-08-21 05:06:29,250 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5096650.0, ans=0.2 2024-08-21 05:06:29,289 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5096650.0, ans=0.0 2024-08-21 05:06:50,090 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5096750.0, ans=0.0 2024-08-21 05:06:50,149 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5096750.0, ans=0.125 2024-08-21 05:06:50,866 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5900, loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.0001483, whisper_loss=0.08876, over 19633.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.0902, over 3831080.96 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:06:59,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5096750.0, ans=0.0 2024-08-21 05:07:07,211 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5096750.0, ans=0.125 2024-08-21 05:07:22,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5096850.0, ans=0.0 2024-08-21 05:07:26,667 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-08-21 05:07:27,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5096850.0, ans=0.125 2024-08-21 05:07:36,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5096950.0, ans=0.125 2024-08-21 05:07:45,528 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 05:08:06,099 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 05:08:35,844 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 5950, loss[loss=0.08175, beats_loss=0.01321, ecapa_loss=0.0001058, whisper_loss=0.06748, over 21833.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.000137, whisper_loss=0.0901, over 3839505.23 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:08:44,059 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 05:08:51,229 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 36 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 05:09:03,672 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-21 05:09:25,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5097450.0, ans=15.0 2024-08-21 05:09:40,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5097550.0, ans=0.0 2024-08-21 05:09:43,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.327e+01 2.629e+01 2.901e+01 4.645e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-21 05:10:16,078 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6000, loss[loss=0.1054, beats_loss=0.009394, ecapa_loss=0.0001423, whisper_loss=0.09455, over 13342.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001364, whisper_loss=0.08955, over 3796649.87 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:10:16,078 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-21 05:10:54,017 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005022, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 05:11:19,416 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on SV_voxceleb1: loss=0.003907, beats_loss=0, ecapa_loss=0.0003907, whisper_loss=0, over 944235.00 frames. 2024-08-21 05:13:02,950 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 05:13:02,954 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-21 05:13:21,330 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5097850.0, ans=0.1 2024-08-21 05:13:39,565 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5097950.0, ans=0.125 2024-08-21 05:13:43,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5097950.0, ans=0.0 2024-08-21 05:13:51,323 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5097950.0, ans=0.1 2024-08-21 05:14:16,168 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-21 05:14:26,041 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 05:14:33,337 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-08-21 05:14:37,236 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6050, loss[loss=0.07981, beats_loss=0.009788, ecapa_loss=0.0001656, whisper_loss=0.06837, over 18907.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01044, ecapa_loss=0.0001371, whisper_loss=0.08856, over 3772079.79 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:15:12,474 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.584e+05 2024-08-21 05:15:38,660 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5098550.0, ans=0.125 2024-08-21 05:15:39,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.180e+01 2.501e+01 2.659e+01 4.695e+01, threshold=5.002e+01, percent-clipped=0.0 2024-08-21 05:15:45,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5098550.0, ans=0.125 2024-08-21 05:16:00,547 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5098650.0, ans=0.0 2024-08-21 05:16:12,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5098750.0, ans=0.05 2024-08-21 05:16:12,977 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6100, loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001116, whisper_loss=0.09011, over 23643.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001375, whisper_loss=0.08926, over 3776568.39 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:16:24,151 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5098750.0, ans=0.2 2024-08-21 05:16:37,555 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5098850.0, ans=0.09899494936611666 2024-08-21 05:16:40,798 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 05:17:00,107 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 05:17:03,882 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 05:17:45,595 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5099150.0, ans=0.0 2024-08-21 05:17:50,051 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-21 05:17:51,999 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6150, loss[loss=0.1191, beats_loss=0.009795, ecapa_loss=0.0001183, whisper_loss=0.1082, over 19164.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001381, whisper_loss=0.08986, over 3801594.72 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:17:55,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5099250.0, ans=0.05 2024-08-21 05:17:55,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5099250.0, ans=0.5 2024-08-21 05:18:19,155 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-21 05:18:24,749 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-21 05:18:28,033 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-21 05:18:32,623 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.906e+01 2024-08-21 05:18:53,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.269e+01 2.480e+01 2.871e+01 4.819e+02, threshold=4.960e+01, percent-clipped=2.0 2024-08-21 05:18:58,999 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 05:19:01,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5099550.0, ans=0.125 2024-08-21 05:19:16,298 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5099650.0, ans=0.0 2024-08-21 05:19:26,650 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6200, loss[loss=0.1074, beats_loss=0.008845, ecapa_loss=0.0001436, whisper_loss=0.09711, over 15910.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.000138, whisper_loss=0.08938, over 3811819.04 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:19:59,639 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 05:20:00,798 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5099850.0, ans=0.09899494936611666 2024-08-21 05:20:02,513 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5099850.0, ans=0.125 2024-08-21 05:20:24,634 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5100050.0, ans=0.015 2024-08-21 05:20:29,897 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-21 05:20:50,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=5100150.0, ans=12.0 2024-08-21 05:20:59,536 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 05:21:06,183 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 05:21:08,194 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6250, loss[loss=0.08272, beats_loss=0.01114, ecapa_loss=0.0001238, whisper_loss=0.07035, over 19213.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001379, whisper_loss=0.08947, over 3806505.50 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:21:26,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5100350.0, ans=0.0 2024-08-21 05:21:39,010 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 05:21:42,450 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5100350.0, ans=0.125 2024-08-21 05:21:52,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5100450.0, ans=0.2 2024-08-21 05:21:57,603 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5100450.0, ans=0.0 2024-08-21 05:22:05,920 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5100450.0, ans=0.0 2024-08-21 05:22:12,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.226e+01 2.520e+01 2.834e+01 9.847e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 05:22:27,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5100650.0, ans=0.1 2024-08-21 05:22:49,574 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6300, loss[loss=0.1103, beats_loss=0.009412, ecapa_loss=0.0001222, whisper_loss=0.0997, over 15357.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001387, whisper_loss=0.08946, over 3797464.21 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:22:59,231 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 24 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-21 05:23:44,496 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 05:23:54,218 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 05:24:20,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5101150.0, ans=0.125 2024-08-21 05:24:21,140 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 05:24:24,557 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-08-21 05:24:26,797 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6350, loss[loss=0.09001, beats_loss=0.01185, ecapa_loss=0.000118, whisper_loss=0.07698, over 17517.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.09018, over 3847607.28 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:24:44,357 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5101250.0, ans=0.04949747468305833 2024-08-21 05:24:47,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5101350.0, ans=0.125 2024-08-21 05:25:01,998 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-21 05:25:06,761 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 05:25:10,215 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5101450.0, ans=0.125 2024-08-21 05:25:11,190 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 05:25:19,428 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5101450.0, ans=0.0 2024-08-21 05:25:20,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5101450.0, ans=0.125 2024-08-21 05:25:21,271 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2024-08-21 05:25:30,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.267e+01 2.496e+01 2.800e+01 3.336e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-21 05:25:41,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5101550.0, ans=0.05 2024-08-21 05:26:04,599 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6400, loss[loss=0.1129, beats_loss=0.009831, ecapa_loss=0.0001462, whisper_loss=0.1016, over 22814.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.09019, over 3845109.83 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:26:09,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5101750.0, ans=0.125 2024-08-21 05:26:13,256 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0562300942838192, model_norm_threshold=49.92792510986328 2024-08-21 05:26:13,416 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.1.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.997e+04, grad_sumsq=6.997e+04, orig_rms_sq=1.000e+00 2024-08-21 05:26:40,704 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5101850.0, ans=0.2 2024-08-21 05:26:50,947 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5101950.0, ans=0.1 2024-08-21 05:26:57,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5101950.0, ans=0.1 2024-08-21 05:27:08,048 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 05:27:31,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5102150.0, ans=0.2 2024-08-21 05:27:37,080 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6450, loss[loss=0.09959, beats_loss=0.01188, ecapa_loss=0.0001347, whisper_loss=0.08636, over 22679.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001388, whisper_loss=0.08995, over 3827805.38 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:28:11,199 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5102350.0, ans=0.125 2024-08-21 05:28:29,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=5102450.0, ans=10.0 2024-08-21 05:28:32,019 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.689e+00 2024-08-21 05:28:34,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.209e+01 2.498e+01 2.911e+01 8.879e+02, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 05:28:57,450 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 05:29:04,805 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 05:29:07,901 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6500, loss[loss=0.08151, beats_loss=0.01378, ecapa_loss=9.19e-05, whisper_loss=0.06681, over 19652.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001387, whisper_loss=0.09028, over 3815100.21 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:29:24,156 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 24 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-21 05:29:28,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5102850.0, ans=0.125 2024-08-21 05:29:33,364 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5102850.0, ans=0.0 2024-08-21 05:30:03,100 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5102950.0, ans=10.0 2024-08-21 05:30:12,575 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 28 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 05:30:25,892 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5103050.0, ans=0.2 2024-08-21 05:30:27,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5103150.0, ans=0.125 2024-08-21 05:30:50,202 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6550, loss[loss=0.09687, beats_loss=0.01215, ecapa_loss=0.0001179, whisper_loss=0.08355, over 23159.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001398, whisper_loss=0.0906, over 3843038.68 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:31:04,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5103250.0, ans=0.0 2024-08-21 05:31:28,184 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 23 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-21 05:31:35,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5103450.0, ans=0.125 2024-08-21 05:31:37,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=12.0 2024-08-21 05:31:46,441 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 05:31:59,039 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.280e+01 2.467e+01 2.785e+01 3.437e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-21 05:32:25,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5103650.0, ans=0.0 2024-08-21 05:32:25,481 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-21 05:32:34,101 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6600, loss[loss=0.1066, beats_loss=0.009298, ecapa_loss=0.0001658, whisper_loss=0.09568, over 14533.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001399, whisper_loss=0.09008, over 3854895.21 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:32:37,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5103750.0, ans=0.125 2024-08-21 05:33:25,056 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-21 05:33:30,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5103950.0, ans=0.0 2024-08-21 05:33:47,389 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 05:33:58,236 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-21 05:34:06,931 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5104150.0, ans=0.125 2024-08-21 05:34:11,155 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 05:34:13,039 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6650, loss[loss=0.09074, beats_loss=0.008688, ecapa_loss=0.00016, whisper_loss=0.08045, over 13914.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001398, whisper_loss=0.09066, over 3851427.50 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:34:26,942 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 05:34:31,186 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 35 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 05:34:49,919 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5104350.0, ans=0.1 2024-08-21 05:34:55,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5104450.0, ans=0.125 2024-08-21 05:35:15,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.320e+01 2.541e+01 2.816e+01 4.422e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-21 05:35:25,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-21 05:35:37,227 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 33 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 05:35:46,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5104650.0, ans=0.1 2024-08-21 05:35:51,146 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6700, loss[loss=0.09876, beats_loss=0.01122, ecapa_loss=0.0001613, whisper_loss=0.08592, over 12815.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01031, ecapa_loss=0.0001393, whisper_loss=0.09136, over 3879249.28 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:36:20,997 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5104850.0, ans=0.1 2024-08-21 05:36:21,403 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-08-21 05:36:38,777 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5104950.0, ans=0.2 2024-08-21 05:37:27,724 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6750, loss[loss=0.1335, beats_loss=0.005617, ecapa_loss=0.0001871, whisper_loss=0.1261, over 13847.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001408, whisper_loss=0.09093, over 3896153.32 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:37:32,254 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 35 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-21 05:37:40,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5105250.0, ans=0.1 2024-08-21 05:37:47,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-21 05:37:59,416 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 05:38:06,621 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-21 05:38:16,977 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-21 05:38:25,763 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.375e+01 2.599e+01 2.848e+01 3.757e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-21 05:38:50,605 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 05:38:54,271 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 19 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 05:38:59,458 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6800, loss[loss=0.1056, beats_loss=0.01031, ecapa_loss=0.0001633, whisper_loss=0.09367, over 22475.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001408, whisper_loss=0.09028, over 3870592.15 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:39:11,703 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 05:39:14,748 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=5105750.0, ans=0.025 2024-08-21 05:39:16,909 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-21 05:39:29,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5105850.0, ans=0.125 2024-08-21 05:39:35,149 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-21 05:39:44,803 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 05:39:53,650 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-21 05:40:04,340 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-08-21 05:40:18,480 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 05:40:33,857 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6850, loss[loss=0.126, beats_loss=0.007409, ecapa_loss=0.0001593, whisper_loss=0.117, over 23216.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.09019, over 3853311.12 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:40:38,660 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-21 05:40:46,869 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5106250.0, ans=0.0 2024-08-21 05:41:10,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5106450.0, ans=0.125 2024-08-21 05:41:16,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5106450.0, ans=0.015 2024-08-21 05:41:32,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.327e+01 2.654e+01 2.944e+01 2.744e+02, threshold=5.308e+01, percent-clipped=2.0 2024-08-21 05:42:05,971 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6900, loss[loss=0.08598, beats_loss=0.01215, ecapa_loss=0.0001354, whisper_loss=0.07247, over 19959.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.000141, whisper_loss=0.08995, over 3860017.64 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:42:08,916 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5106750.0, ans=0.125 2024-08-21 05:42:36,198 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 05:42:54,506 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5106950.0, ans=0.125 2024-08-21 05:42:54,617 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5106950.0, ans=0.0 2024-08-21 05:43:16,571 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5107150.0, ans=0.125 2024-08-21 05:43:35,495 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 6950, loss[loss=0.1293, beats_loss=0.008503, ecapa_loss=0.0001235, whisper_loss=0.1196, over 20495.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001399, whisper_loss=0.08984, over 3842882.79 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:43:38,539 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2024-08-21 05:43:43,077 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5107250.0, ans=0.125 2024-08-21 05:44:02,181 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 05:44:13,774 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5107450.0, ans=0.0 2024-08-21 05:44:27,197 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-21 05:44:28,939 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2024-08-21 05:44:32,094 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5107550.0, ans=0.07 2024-08-21 05:44:32,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.277e+01 2.529e+01 2.921e+01 4.469e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 05:44:33,764 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5107550.0, ans=0.125 2024-08-21 05:44:37,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5107550.0, ans=0.1 2024-08-21 05:44:42,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5107550.0, ans=0.125 2024-08-21 05:44:49,278 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 05:45:06,282 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7000, loss[loss=0.1131, beats_loss=0.01011, ecapa_loss=0.0001597, whisper_loss=0.1014, over 22534.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001403, whisper_loss=0.08949, over 3799296.52 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:45:07,032 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5107750.0, ans=0.125 2024-08-21 05:45:14,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5107750.0, ans=0.2 2024-08-21 05:45:15,760 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 05:45:18,580 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5107750.0, ans=0.125 2024-08-21 05:45:22,080 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-21 05:45:23,070 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 05:45:26,320 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-21 05:45:29,142 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5107850.0, ans=0.125 2024-08-21 05:45:42,088 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 05:45:51,615 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5107950.0, ans=0.1 2024-08-21 05:46:01,039 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5107950.0, ans=0.0 2024-08-21 05:46:05,815 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 05:46:11,876 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-21 05:46:21,468 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=22.5 2024-08-21 05:46:38,979 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7050, loss[loss=0.1213, beats_loss=0.01013, ecapa_loss=0.0001346, whisper_loss=0.1098, over 19888.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001392, whisper_loss=0.09048, over 3805898.00 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:46:51,171 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=5108250.0, ans=15.0 2024-08-21 05:46:53,733 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5108250.0, ans=0.0 2024-08-21 05:46:56,212 WARNING [optim.py:496] (1/4) Scaling gradients by 0.049437928944826126, model_norm_threshold=50.57056427001953 2024-08-21 05:46:56,368 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.007e+05, grad_sumsq=1.862e+07, orig_rms_sq=1.077e-02 2024-08-21 05:47:16,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5108450.0, ans=0.0 2024-08-21 05:47:29,952 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 05:47:34,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.310e+01 2.518e+01 2.782e+01 1.023e+03, threshold=5.036e+01, percent-clipped=2.0 2024-08-21 05:47:58,434 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07700152695178986, model_norm_threshold=50.36127471923828 2024-08-21 05:47:58,592 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.694e+04, grad_sumsq=7.694e+04, orig_rms_sq=1.000e+00 2024-08-21 05:48:05,787 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 31 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-21 05:48:06,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5108750.0, ans=0.2 2024-08-21 05:48:07,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-21 05:48:07,494 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7100, loss[loss=0.106, beats_loss=0.00855, ecapa_loss=0.0001656, whisper_loss=0.09579, over 19628.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001393, whisper_loss=0.08965, over 3777729.07 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:48:26,630 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.142e+00 2024-08-21 05:48:36,521 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 17 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 05:48:59,844 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-21 05:49:00,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5109050.0, ans=0.125 2024-08-21 05:49:02,345 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-21 05:49:07,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5109050.0, ans=0.0 2024-08-21 05:49:14,850 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5109050.0, ans=0.0 2024-08-21 05:49:35,491 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5109250.0, ans=0.2 2024-08-21 05:49:36,238 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7150, loss[loss=0.1144, beats_loss=0.008451, ecapa_loss=0.000145, whisper_loss=0.1045, over 22091.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001377, whisper_loss=0.08924, over 3812930.76 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:49:54,317 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5109350.0, ans=0.0 2024-08-21 05:49:56,133 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5109350.0, ans=0.125 2024-08-21 05:49:57,986 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5109350.0, ans=0.125 2024-08-21 05:50:01,038 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 05:50:09,631 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 05:50:17,355 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5109450.0, ans=0.125 2024-08-21 05:50:26,516 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5109450.0, ans=0.125 2024-08-21 05:50:28,375 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5109550.0, ans=0.125 2024-08-21 05:50:29,251 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 05:50:32,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.268e+01 2.455e+01 2.616e+01 6.540e+02, threshold=4.909e+01, percent-clipped=2.0 2024-08-21 05:50:35,544 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:50:36,516 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 05:50:41,176 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5109550.0, ans=0.125 2024-08-21 05:50:41,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-21 05:50:45,005 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5109550.0, ans=0.125 2024-08-21 05:50:52,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5109650.0, ans=0.2 2024-08-21 05:50:58,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5109650.0, ans=0.1 2024-08-21 05:51:05,206 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5109750.0, ans=0.0 2024-08-21 05:51:05,956 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7200, loss[loss=0.105, beats_loss=0.01148, ecapa_loss=0.0001085, whisper_loss=0.09243, over 12810.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01053, ecapa_loss=0.000138, whisper_loss=0.08859, over 3832486.97 frames. ], batch size: 49, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:51:16,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5109750.0, ans=0.0 2024-08-21 05:51:23,783 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 05:51:33,087 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 05:51:35,419 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5109850.0, ans=0.1 2024-08-21 05:51:36,472 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5109850.0, ans=0.125 2024-08-21 05:51:42,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5109950.0, ans=0.1 2024-08-21 05:52:18,402 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 05:52:37,641 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7250, loss[loss=0.111, beats_loss=0.008039, ecapa_loss=0.0001416, whisper_loss=0.1016, over 13605.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001386, whisper_loss=0.08956, over 3832956.78 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:52:41,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5110250.0, ans=0.125 2024-08-21 05:52:47,052 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 05:53:01,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5110350.0, ans=0.04949747468305833 2024-08-21 05:53:16,977 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5110450.0, ans=0.125 2024-08-21 05:53:29,880 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5110450.0, ans=0.04949747468305833 2024-08-21 05:53:32,650 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5110550.0, ans=0.0 2024-08-21 05:53:35,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.241e+01 2.447e+01 2.768e+01 8.311e+01, threshold=4.894e+01, percent-clipped=2.0 2024-08-21 05:53:51,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2024-08-21 05:54:07,920 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7300, loss[loss=0.07999, beats_loss=0.01182, ecapa_loss=0.0001305, whisper_loss=0.06686, over 21337.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.08921, over 3830320.24 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:54:19,902 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 05:54:30,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5110850.0, ans=0.2 2024-08-21 05:54:40,987 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5110850.0, ans=0.125 2024-08-21 05:54:58,770 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-21 05:55:15,138 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 05:55:18,116 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5111150.0, ans=0.09899494936611666 2024-08-21 05:55:28,917 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=12.0 2024-08-21 05:55:35,852 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5111250.0, ans=0.125 2024-08-21 05:55:36,594 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7350, loss[loss=0.1115, beats_loss=0.01146, ecapa_loss=0.0001125, whisper_loss=0.09895, over 19495.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.08872, over 3842186.10 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:55:48,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5111250.0, ans=0.125 2024-08-21 05:56:08,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5111350.0, ans=0.0 2024-08-21 05:56:28,591 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 16 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-21 05:56:33,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.293e+01 2.578e+01 2.831e+01 4.096e+01, threshold=5.157e+01, percent-clipped=0.0 2024-08-21 05:56:35,731 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 05:56:45,453 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=22.5 2024-08-21 05:56:55,682 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5111650.0, ans=0.0 2024-08-21 05:57:04,674 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7400, loss[loss=0.1124, beats_loss=0.01007, ecapa_loss=0.0001357, whisper_loss=0.1009, over 21801.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.0001409, whisper_loss=0.08876, over 3863127.45 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:57:26,112 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 05:57:40,927 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08977154642343521, model_norm_threshold=51.56612014770508 2024-08-21 05:57:41,085 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.273e+04, grad_sumsq=4.273e+04, orig_rms_sq=1.000e+00 2024-08-21 05:57:42,819 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 32 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-21 05:58:09,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2024-08-21 05:58:21,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5112150.0, ans=0.0 2024-08-21 05:58:31,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5112150.0, ans=0.125 2024-08-21 05:58:34,083 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7450, loss[loss=0.08206, beats_loss=0.01226, ecapa_loss=9.314e-05, whisper_loss=0.06886, over 16817.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.08891, over 3871517.08 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:59:09,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5112450.0, ans=0.2 2024-08-21 05:59:09,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5112450.0, ans=0.0 2024-08-21 05:59:10,109 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 13 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-21 05:59:21,605 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5112450.0, ans=0.05 2024-08-21 05:59:23,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5112450.0, ans=0.125 2024-08-21 05:59:28,911 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-21 05:59:31,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.315e+01 2.613e+01 3.029e+01 5.744e+02, threshold=5.226e+01, percent-clipped=1.0 2024-08-21 05:59:44,903 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=12.0 2024-08-21 05:59:50,643 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 06:00:03,582 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7500, loss[loss=0.09373, beats_loss=0.01245, ecapa_loss=0.0001129, whisper_loss=0.08015, over 16773.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08898, over 3872014.91 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:00:19,193 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5112750.0, ans=0.1 2024-08-21 06:00:22,140 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-21 06:01:00,123 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 06:01:34,458 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7550, loss[loss=0.1201, beats_loss=0.01131, ecapa_loss=0.0001584, whisper_loss=0.1072, over 20899.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01048, ecapa_loss=0.0001395, whisper_loss=0.08873, over 3829764.60 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 06:01:39,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5113250.0, ans=0.0 2024-08-21 06:01:47,307 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5113250.0, ans=0.125 2024-08-21 06:02:12,301 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 29 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-21 06:02:36,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.241e+01 2.500e+01 2.791e+01 3.634e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-21 06:02:36,553 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 06:02:54,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5113650.0, ans=0.0 2024-08-21 06:03:07,990 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7600, loss[loss=0.09883, beats_loss=0.009189, ecapa_loss=0.0001739, whisper_loss=0.0879, over 20272.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.08806, over 3820036.82 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:03:09,131 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5113750.0, ans=0.035 2024-08-21 06:03:10,724 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 06:03:33,764 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 06:03:41,841 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 06:04:06,590 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-21 06:04:23,493 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5114150.0, ans=0.07 2024-08-21 06:04:40,888 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-21 06:04:42,287 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7650, loss[loss=0.09053, beats_loss=0.01096, ecapa_loss=0.0001644, whisper_loss=0.07792, over 20342.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.0884, over 3805659.09 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:04:48,717 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5114250.0, ans=0.1 2024-08-21 06:04:50,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5114250.0, ans=0.125 2024-08-21 06:04:50,309 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5114250.0, ans=0.125 2024-08-21 06:05:06,838 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 06:05:11,139 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5114350.0, ans=0.1 2024-08-21 06:05:42,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.478e+01 2.742e+01 4.351e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-21 06:05:45,573 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5114550.0, ans=0.0 2024-08-21 06:05:58,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5114650.0, ans=0.125 2024-08-21 06:06:10,538 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 06:06:12,384 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 06:06:12,645 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5114750.0, ans=0.125 2024-08-21 06:06:12,679 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5114750.0, ans=0.1 2024-08-21 06:06:13,469 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7700, loss[loss=0.0849, beats_loss=0.01129, ecapa_loss=0.0001728, whisper_loss=0.07188, over 13796.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01035, ecapa_loss=0.0001407, whisper_loss=0.08915, over 3820312.87 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:06:24,483 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5114750.0, ans=0.125 2024-08-21 06:06:48,147 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 18 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 06:06:55,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5114850.0, ans=0.2 2024-08-21 06:07:00,925 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 26 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-21 06:07:03,218 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-21 06:07:36,250 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-21 06:07:41,404 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5115150.0, ans=0.1 2024-08-21 06:07:43,291 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5115150.0, ans=0.125 2024-08-21 06:07:58,852 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7750, loss[loss=0.06949, beats_loss=0.01169, ecapa_loss=0.0001409, whisper_loss=0.05639, over 18888.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.08856, over 3817119.39 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:08:03,468 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 06:08:16,314 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 06:08:20,826 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-21 06:08:27,371 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 14 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 06:08:33,920 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-21 06:08:47,510 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5115450.0, ans=0.1 2024-08-21 06:08:59,580 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-21 06:09:03,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.270e+01 2.577e+01 2.902e+01 8.135e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-21 06:09:03,288 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 06:09:05,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5115550.0, ans=0.0 2024-08-21 06:09:20,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5115650.0, ans=0.2 2024-08-21 06:09:23,884 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5115650.0, ans=0.1 2024-08-21 06:09:30,478 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2024-08-21 06:09:34,804 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7800, loss[loss=0.1123, beats_loss=0.01034, ecapa_loss=0.0001234, whisper_loss=0.1007, over 22710.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.08885, over 3825868.04 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:09:45,723 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5115750.0, ans=0.125 2024-08-21 06:09:48,440 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 06:10:16,321 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-21 06:10:24,103 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5115950.0, ans=0.09899494936611666 2024-08-21 06:10:42,615 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 06:10:56,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5116150.0, ans=0.125 2024-08-21 06:11:10,098 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7850, loss[loss=0.09375, beats_loss=0.01238, ecapa_loss=0.000117, whisper_loss=0.0802, over 21098.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01046, ecapa_loss=0.0001379, whisper_loss=0.0888, over 3821566.38 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:11:17,698 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5116250.0, ans=0.125 2024-08-21 06:11:25,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-21 06:11:28,232 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5116350.0, ans=0.125 2024-08-21 06:11:28,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5116350.0, ans=0.2 2024-08-21 06:11:35,639 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5116350.0, ans=0.1 2024-08-21 06:11:38,429 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-21 06:11:50,609 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-21 06:11:51,234 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5116450.0, ans=0.0 2024-08-21 06:11:54,351 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5116450.0, ans=0.0 2024-08-21 06:12:03,090 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-08-21 06:12:08,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.271e+01 2.419e+01 2.702e+01 3.999e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-21 06:12:15,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5116550.0, ans=0.125 2024-08-21 06:12:25,887 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5116650.0, ans=0.2 2024-08-21 06:12:26,317 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-21 06:12:41,170 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7900, loss[loss=0.08929, beats_loss=0.01181, ecapa_loss=0.0001232, whisper_loss=0.07625, over 22686.00 frames. ], tot_loss[loss=0.09999, beats_loss=0.01047, ecapa_loss=0.0001376, whisper_loss=0.08814, over 3808593.43 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:12:51,659 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 24 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-21 06:12:59,522 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5116850.0, ans=0.0 2024-08-21 06:13:02,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5116850.0, ans=0.125 2024-08-21 06:13:05,291 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 16 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-21 06:13:17,394 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.68 vs. limit=10.0 2024-08-21 06:13:23,525 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 06:13:40,989 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=10.0 2024-08-21 06:13:44,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5117050.0, ans=0.0 2024-08-21 06:13:51,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5117150.0, ans=0.0 2024-08-21 06:13:51,592 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5117150.0, ans=0.125 2024-08-21 06:13:56,266 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 30 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 06:14:02,233 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5117150.0, ans=0.125 2024-08-21 06:14:10,368 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 7950, loss[loss=0.07819, beats_loss=0.01102, ecapa_loss=0.0001474, whisper_loss=0.0657, over 16589.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01038, ecapa_loss=0.0001383, whisper_loss=0.08867, over 3812476.45 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:14:23,752 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 21 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-21 06:14:37,201 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5117350.0, ans=0.125 2024-08-21 06:14:51,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=5117450.0, ans=0.95 2024-08-21 06:14:54,218 WARNING [optim.py:496] (1/4) Scaling gradients by 0.07848511636257172, model_norm_threshold=48.37834167480469 2024-08-21 06:14:54,374 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.905e+04, grad_sumsq=8.269e+06, orig_rms_sq=1.077e-02 2024-08-21 06:15:06,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.240e+01 2.515e+01 2.710e+01 6.164e+02, threshold=5.030e+01, percent-clipped=3.0 2024-08-21 06:15:29,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5117650.0, ans=0.2 2024-08-21 06:15:37,031 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8000, loss[loss=0.08356, beats_loss=0.01022, ecapa_loss=0.0001815, whisper_loss=0.07152, over 15105.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0104, ecapa_loss=0.0001385, whisper_loss=0.08884, over 3789198.03 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:15:47,086 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5117750.0, ans=0.125 2024-08-21 06:15:48,143 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 25 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-21 06:16:02,573 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-21 06:16:27,879 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-21 06:16:46,141 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5118050.0, ans=0.125 2024-08-21 06:16:48,051 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.886e+00 2024-08-21 06:16:53,455 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-21 06:17:00,474 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-08-21 06:17:05,639 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8050, loss[loss=0.0964, beats_loss=0.01179, ecapa_loss=0.0001367, whisper_loss=0.08324, over 16235.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01042, ecapa_loss=0.0001395, whisper_loss=0.08859, over 3744716.59 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:17:11,673 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=5118250.0, ans=0.02 2024-08-21 06:17:23,552 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5118350.0, ans=0.0 2024-08-21 06:17:27,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5118350.0, ans=0.125 2024-08-21 06:17:28,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5118350.0, ans=0.125 2024-08-21 06:17:32,430 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5118350.0, ans=0.0 2024-08-21 06:17:51,787 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 06:18:03,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.278e+01 2.668e+01 2.870e+01 8.505e+01, threshold=5.336e+01, percent-clipped=1.0 2024-08-21 06:18:25,497 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-21 06:18:32,627 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5118650.0, ans=0.125 2024-08-21 06:18:34,975 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8100, loss[loss=0.1064, beats_loss=0.009912, ecapa_loss=0.0001392, whisper_loss=0.0951, over 18542.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.08944, over 3752538.76 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:18:36,643 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 06:18:44,172 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5118750.0, ans=0.0 2024-08-21 06:19:24,705 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 15 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 06:19:29,586 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 11 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 06:19:31,169 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 06:19:48,669 WARNING [optim.py:496] (1/4) Scaling gradients by 0.08069697767496109, model_norm_threshold=53.36321258544922 2024-08-21 06:19:48,824 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.079e+04, grad_sumsq=7.079e+04, orig_rms_sq=1.000e+00 2024-08-21 06:19:49,315 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5119150.0, ans=0.0 2024-08-21 06:19:52,542 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5119150.0, ans=0.125 2024-08-21 06:19:52,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5119150.0, ans=0.125 2024-08-21 06:20:01,635 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5119250.0, ans=0.1 2024-08-21 06:20:02,404 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8150, loss[loss=0.07628, beats_loss=0.01378, ecapa_loss=0.000113, whisper_loss=0.06136, over 18405.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001385, whisper_loss=0.08906, over 3770324.69 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:20:07,112 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5119250.0, ans=0.0 2024-08-21 06:20:16,662 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 22 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-21 06:20:24,330 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.22 vs. limit=22.5 2024-08-21 06:20:27,195 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5119350.0, ans=0.015 2024-08-21 06:20:30,396 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 06:20:48,278 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 06:20:53,597 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-21 06:20:58,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.296e+01 2.463e+01 2.711e+01 6.613e+02, threshold=4.926e+01, percent-clipped=2.0 2024-08-21 06:21:00,675 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 36 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 06:21:26,838 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 06:21:27,800 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8200, loss[loss=0.08464, beats_loss=0.0099, ecapa_loss=0.0001266, whisper_loss=0.07347, over 15233.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.000137, whisper_loss=0.08937, over 3772297.88 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:21:48,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5119850.0, ans=0.125 2024-08-21 06:21:49,658 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 06:22:13,842 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 06:22:27,101 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 22 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-21 06:22:41,612 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5120150.0, ans=0.125 2024-08-21 06:22:57,643 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8250, loss[loss=0.08421, beats_loss=0.01212, ecapa_loss=0.0001422, whisper_loss=0.07067, over 13786.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001372, whisper_loss=0.08965, over 3805126.64 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:23:12,075 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 19 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 06:23:20,468 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-21 06:23:23,414 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-21 06:23:54,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.309e+01 2.543e+01 2.823e+01 1.094e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 06:24:10,560 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2024-08-21 06:24:25,022 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8300, loss[loss=0.08915, beats_loss=0.01269, ecapa_loss=0.0001253, whisper_loss=0.07521, over 22237.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001377, whisper_loss=0.08954, over 3804366.66 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:24:30,237 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-21 06:24:43,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5120850.0, ans=0.125 2024-08-21 06:24:48,310 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5120850.0, ans=0.125 2024-08-21 06:24:58,910 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-21 06:25:02,343 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5120950.0, ans=0.125 2024-08-21 06:25:28,817 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 06:25:42,512 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5121150.0, ans=0.5 2024-08-21 06:25:45,740 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-21 06:25:56,023 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8350, loss[loss=0.11, beats_loss=0.01014, ecapa_loss=0.0001463, whisper_loss=0.09837, over 22823.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.000138, whisper_loss=0.08984, over 3842568.73 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:26:01,138 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5121250.0, ans=0.0 2024-08-21 06:26:09,127 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5121250.0, ans=0.0 2024-08-21 06:26:16,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5121350.0, ans=0.1 2024-08-21 06:26:32,275 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 06:26:32,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5121350.0, ans=0.07 2024-08-21 06:26:47,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5121450.0, ans=0.0 2024-08-21 06:26:51,169 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5121450.0, ans=0.2 2024-08-21 06:26:55,614 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0374552384018898, model_norm_threshold=50.851959228515625 2024-08-21 06:26:55,771 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.895e+05, grad_sumsq=1.757e+07, orig_rms_sq=1.078e-02 2024-08-21 06:26:59,720 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.288e+01 2.464e+01 2.782e+01 1.358e+03, threshold=4.928e+01, percent-clipped=2.0 2024-08-21 06:27:02,515 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5121550.0, ans=0.0 2024-08-21 06:27:26,534 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5121650.0, ans=0.125 2024-08-21 06:27:33,206 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8400, loss[loss=0.1054, beats_loss=0.008318, ecapa_loss=0.0001638, whisper_loss=0.09549, over 18939.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001375, whisper_loss=0.08982, over 3871347.70 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:27:43,659 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5121750.0, ans=0.125 2024-08-21 06:27:56,264 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-21 06:28:05,410 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5121850.0, ans=0.1 2024-08-21 06:28:08,259 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-21 06:28:10,279 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 06:28:16,273 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 34 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 06:28:16,939 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5121950.0, ans=0.0 2024-08-21 06:28:28,643 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-21 06:28:49,657 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5122150.0, ans=0.125 2024-08-21 06:28:57,714 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5122150.0, ans=0.125 2024-08-21 06:29:03,433 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 06:29:06,827 WARNING [optim.py:496] (1/4) Scaling gradients by 0.038237348198890686, model_norm_threshold=49.277313232421875 2024-08-21 06:29:06,985 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.805e+05, grad_sumsq=1.805e+05, orig_rms_sq=1.000e+00 2024-08-21 06:29:07,025 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8450, loss[loss=0.08961, beats_loss=0.01127, ecapa_loss=0.0001359, whisper_loss=0.07698, over 17307.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001371, whisper_loss=0.08932, over 3860076.11 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:29:13,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5122250.0, ans=0.1 2024-08-21 06:29:33,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5122350.0, ans=0.125 2024-08-21 06:30:11,889 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.430e+01 2.632e+01 3.117e+01 1.289e+03, threshold=5.264e+01, percent-clipped=4.0 2024-08-21 06:30:34,061 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 17 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-21 06:30:39,242 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5122650.0, ans=0.0 2024-08-21 06:30:46,097 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8500, loss[loss=0.1196, beats_loss=0.009047, ecapa_loss=0.0001454, whisper_loss=0.1091, over 22407.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001377, whisper_loss=0.08932, over 3864217.90 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:30:49,336 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5122750.0, ans=0.5 2024-08-21 06:30:53,038 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5122750.0, ans=0.2 2024-08-21 06:31:00,292 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 06:31:09,296 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5122850.0, ans=0.125 2024-08-21 06:31:11,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2024-08-21 06:31:12,139 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 18 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-21 06:31:38,910 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5122950.0, ans=0.2 2024-08-21 06:31:49,163 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5123050.0, ans=0.1 2024-08-21 06:32:02,176 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 06:32:24,531 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8550, loss[loss=0.0797, beats_loss=0.0104, ecapa_loss=0.0001497, whisper_loss=0.0678, over 14406.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08876, over 3852746.19 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:32:32,955 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 06:32:34,692 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 06:32:59,824 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 06:33:06,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5123450.0, ans=0.0 2024-08-21 06:33:20,594 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=22.5 2024-08-21 06:33:30,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.375e+01 2.634e+01 2.950e+01 1.431e+02, threshold=5.267e+01, percent-clipped=1.0 2024-08-21 06:34:01,048 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 06:34:04,708 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8600, loss[loss=0.1199, beats_loss=0.008746, ecapa_loss=0.0001351, whisper_loss=0.1098, over 19070.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.08852, over 3846874.93 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:34:32,511 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2024-08-21 06:34:51,252 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5123950.0, ans=0.0 2024-08-21 06:34:55,617 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 06:35:01,975 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5124050.0, ans=0.0 2024-08-21 06:35:06,466 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5124050.0, ans=0.125 2024-08-21 06:35:27,312 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5124150.0, ans=0.2 2024-08-21 06:35:27,313 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5124150.0, ans=0.125 2024-08-21 06:35:30,183 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 06:35:42,394 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5124250.0, ans=0.025 2024-08-21 06:35:42,677 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-21 06:35:43,139 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8650, loss[loss=0.09432, beats_loss=0.01278, ecapa_loss=0.0001133, whisper_loss=0.08041, over 15204.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.00014, whisper_loss=0.08897, over 3832441.56 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:36:03,013 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5124350.0, ans=0.125 2024-08-21 06:36:17,432 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5124350.0, ans=0.0 2024-08-21 06:36:40,558 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 16 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 06:36:47,623 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.320e+01 2.649e+01 2.917e+01 4.406e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-21 06:36:48,742 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5124550.0, ans=0.1 2024-08-21 06:36:58,681 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 06:37:02,164 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-08-21 06:37:14,586 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5124650.0, ans=0.2 2024-08-21 06:37:24,439 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8700, loss[loss=0.0806, beats_loss=0.01257, ecapa_loss=0.000111, whisper_loss=0.06692, over 19582.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.0001385, whisper_loss=0.08881, over 3824901.36 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:37:36,531 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 22 from LS+wenet, 34 from Vox, 38 fro AS 2024-08-21 06:37:39,674 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-21 06:38:14,314 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5124950.0, ans=0.1 2024-08-21 06:38:30,167 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 06:38:50,208 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5125150.0, ans=0.1 2024-08-21 06:39:00,036 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8750, loss[loss=0.1024, beats_loss=0.01103, ecapa_loss=0.0001293, whisper_loss=0.09008, over 22140.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.08944, over 3815150.90 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:39:17,308 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-21 06:39:24,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5125350.0, ans=0.0 2024-08-21 06:39:29,936 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-21 06:39:40,518 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 13 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-21 06:39:58,011 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 06:39:59,061 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2024-08-21 06:40:01,936 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.240e+01 2.524e+01 2.807e+01 1.444e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-21 06:40:06,838 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5125550.0, ans=0.1 2024-08-21 06:40:19,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5125650.0, ans=0.125 2024-08-21 06:40:26,222 INFO [train_multi_KD3.py:845] (1/4) A total of 49 cuts. 19 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-21 06:40:28,263 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5125650.0, ans=0.125 2024-08-21 06:40:34,710 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8800, loss[loss=0.1, beats_loss=0.01052, ecapa_loss=0.000121, whisper_loss=0.0883, over 13803.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.08979, over 3771895.29 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:40:35,460 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5125750.0, ans=0.0 2024-08-21 06:40:54,083 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5125850.0, ans=0.125 2024-08-21 06:41:16,347 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5125950.0, ans=0.95 2024-08-21 06:41:16,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5125950.0, ans=0.025 2024-08-21 06:41:25,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-21 06:41:27,816 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 06:41:31,946 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5126050.0, ans=0.0 2024-08-21 06:41:37,264 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5126050.0, ans=0.125 2024-08-21 06:41:45,962 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 06:42:01,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5126150.0, ans=0.125 2024-08-21 06:42:04,614 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8850, loss[loss=0.0678, beats_loss=0.009157, ecapa_loss=0.0001487, whisper_loss=0.05716, over 13404.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001382, whisper_loss=0.08878, over 3763206.91 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:42:10,548 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-21 06:42:12,176 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-21 06:42:16,877 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-21 06:42:28,950 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 06:42:30,726 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-21 06:42:32,841 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 06:43:11,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.228e+01 2.528e+01 2.793e+01 5.836e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-21 06:43:45,881 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8900, loss[loss=0.1051, beats_loss=0.008754, ecapa_loss=0.0001292, whisper_loss=0.09506, over 19217.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001389, whisper_loss=0.08902, over 3791982.63 frames. ], batch size: 75, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:44:14,622 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5126850.0, ans=0.2 2024-08-21 06:44:29,540 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5126950.0, ans=0.1 2024-08-21 06:45:12,880 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 06:45:18,661 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 8950, loss[loss=0.128, beats_loss=0.009621, ecapa_loss=0.0001413, whisper_loss=0.117, over 22320.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001382, whisper_loss=0.08972, over 3783340.07 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:45:20,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.98 vs. limit=6.0 2024-08-21 06:45:26,863 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-21 06:45:31,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5127250.0, ans=0.0 2024-08-21 06:45:56,595 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:46:06,243 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5127450.0, ans=0.0 2024-08-21 06:46:21,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2024-08-21 06:46:23,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.262e+01 2.428e+01 2.785e+01 3.880e+01, threshold=4.857e+01, percent-clipped=0.0 2024-08-21 06:46:45,572 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5127650.0, ans=0.125 2024-08-21 06:46:56,915 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9000, loss[loss=0.09488, beats_loss=0.009308, ecapa_loss=0.0001248, whisper_loss=0.08433, over 14375.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.000138, whisper_loss=0.09001, over 3790905.43 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:46:56,915 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-21 06:47:28,546 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1275, 3.1159, 3.8736, 3.7263], device='cuda:1') 2024-08-21 06:47:34,700 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005065, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 06:47:57,357 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on SV_voxceleb1: loss=0.003886, beats_loss=0, ecapa_loss=0.0003886, whisper_loss=0, over 944235.00 frames. 2024-08-21 06:48:56,508 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0091, 2.3577, 2.3652, 2.2824], device='cuda:1') 2024-08-21 06:49:39,512 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on AT_audioset: loss=0.02296, beats_loss=0.02296, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 06:49:39,516 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-21 06:49:41,326 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5127750.0, ans=0.0 2024-08-21 06:49:58,794 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-21 06:50:14,096 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5127950.0, ans=0.125 2024-08-21 06:50:20,404 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 06:51:00,757 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 06:51:08,173 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9050, loss[loss=0.0867, beats_loss=0.01294, ecapa_loss=0.0001132, whisper_loss=0.07263, over 21272.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001359, whisper_loss=0.08936, over 3808320.03 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:51:15,517 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 15 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 06:51:24,286 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-21 06:51:25,341 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 06:51:25,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5128350.0, ans=0.0 2024-08-21 06:51:41,556 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 33 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-21 06:51:44,902 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5128350.0, ans=0.0 2024-08-21 06:52:12,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.208e+01 2.419e+01 2.777e+01 1.932e+02, threshold=4.839e+01, percent-clipped=1.0 2024-08-21 06:52:37,480 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5128650.0, ans=0.125 2024-08-21 06:52:39,327 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5128650.0, ans=0.125 2024-08-21 06:52:41,761 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9100, loss[loss=0.09605, beats_loss=0.01227, ecapa_loss=0.0001318, whisper_loss=0.08246, over 22511.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.000136, whisper_loss=0.08935, over 3806223.38 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:52:51,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5128750.0, ans=0.1 2024-08-21 06:52:56,145 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 14 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 06:53:11,030 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 06:53:19,956 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5128950.0, ans=0.125 2024-08-21 06:53:49,760 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5129050.0, ans=0.1 2024-08-21 06:54:13,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5129150.0, ans=0.0 2024-08-21 06:54:15,675 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9150, loss[loss=0.09583, beats_loss=0.01057, ecapa_loss=0.0001358, whisper_loss=0.0839, over 16760.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001368, whisper_loss=0.08942, over 3799127.71 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:54:16,642 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5129250.0, ans=0.0 2024-08-21 06:54:27,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5129250.0, ans=0.2 2024-08-21 06:54:38,477 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5129350.0, ans=0.0 2024-08-21 06:54:54,301 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5129450.0, ans=0.125 2024-08-21 06:55:00,412 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-08-21 06:55:12,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.49 vs. limit=22.5 2024-08-21 06:55:13,437 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 15 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 06:55:16,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.222e+01 2.469e+01 2.826e+01 4.057e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-21 06:55:22,534 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=22.5 2024-08-21 06:55:35,362 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5129650.0, ans=0.125 2024-08-21 06:55:35,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5129650.0, ans=0.95 2024-08-21 06:55:36,434 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-21 06:55:46,043 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5129750.0, ans=0.2 2024-08-21 06:55:46,851 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9200, loss[loss=0.0955, beats_loss=0.01087, ecapa_loss=0.0001727, whisper_loss=0.0829, over 17124.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001371, whisper_loss=0.08875, over 3779158.60 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:55:47,961 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5129750.0, ans=0.125 2024-08-21 06:55:51,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5129750.0, ans=0.0 2024-08-21 06:56:05,078 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-21 06:56:09,768 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-21 06:56:21,013 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 06:56:22,437 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-08-21 06:56:25,101 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 21 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-21 06:56:27,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5129950.0, ans=0.125 2024-08-21 06:56:34,872 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5129950.0, ans=0.125 2024-08-21 06:56:47,084 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5130050.0, ans=0.1 2024-08-21 06:57:02,106 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5130150.0, ans=0.125 2024-08-21 06:57:22,052 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9250, loss[loss=0.08899, beats_loss=0.01273, ecapa_loss=0.0001018, whisper_loss=0.07524, over 20702.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001375, whisper_loss=0.08924, over 3763349.27 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:57:25,193 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2024-08-21 06:57:27,951 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 06:58:01,444 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 36 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 06:58:13,831 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=5130450.0, ans=15.0 2024-08-21 06:58:14,222 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 06:58:22,823 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5130550.0, ans=0.2 2024-08-21 06:58:23,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.254e+01 2.574e+01 2.946e+01 4.918e+02, threshold=5.149e+01, percent-clipped=3.0 2024-08-21 06:58:44,871 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 06:58:54,184 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5130650.0, ans=0.1 2024-08-21 06:58:58,619 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9300, loss[loss=0.09655, beats_loss=0.01302, ecapa_loss=0.0001049, whisper_loss=0.08248, over 22331.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001367, whisper_loss=0.08903, over 3749983.16 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:59:03,945 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.85 vs. limit=22.5 2024-08-21 06:59:14,344 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-21 06:59:34,550 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-21 07:00:01,694 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5131050.0, ans=10.0 2024-08-21 07:00:02,898 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 07:00:09,873 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-21 07:00:15,878 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 07:00:20,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5131150.0, ans=0.125 2024-08-21 07:00:33,900 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9350, loss[loss=0.1011, beats_loss=0.01001, ecapa_loss=0.0001559, whisper_loss=0.08957, over 22461.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001366, whisper_loss=0.08915, over 3779817.16 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:00:56,611 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 07:01:01,321 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5131350.0, ans=0.1 2024-08-21 07:01:22,903 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 07:01:35,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.276e+01 2.548e+01 2.859e+01 2.021e+02, threshold=5.096e+01, percent-clipped=1.0 2024-08-21 07:01:45,819 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5131550.0, ans=0.1 2024-08-21 07:01:48,495 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 07:02:04,210 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 21 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-21 07:02:07,476 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9400, loss[loss=0.1094, beats_loss=0.009162, ecapa_loss=0.00014, whisper_loss=0.09886, over 18064.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001358, whisper_loss=0.08917, over 3791752.16 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:02:14,701 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 07:02:19,511 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.663e+01 2024-08-21 07:02:31,678 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-21 07:02:40,005 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-08-21 07:03:09,057 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.660e-02 2024-08-21 07:03:09,058 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5132050.0, ans=0.1 2024-08-21 07:03:10,265 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 07:03:20,529 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5132150.0, ans=0.0 2024-08-21 07:03:25,097 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 07:03:40,011 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9450, loss[loss=0.1076, beats_loss=0.008713, ecapa_loss=0.0001856, whisper_loss=0.097, over 15231.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01058, ecapa_loss=0.0001359, whisper_loss=0.08937, over 3833057.32 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:03:46,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5132250.0, ans=0.125 2024-08-21 07:03:47,124 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2024-08-21 07:03:49,372 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 07:04:06,006 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 07:04:21,150 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-21 07:04:25,102 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-21 07:04:35,154 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-21 07:04:40,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.244e+01 2.507e+01 2.864e+01 1.489e+02, threshold=5.014e+01, percent-clipped=2.0 2024-08-21 07:04:46,921 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-21 07:05:13,301 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9500, loss[loss=0.07322, beats_loss=0.01298, ecapa_loss=0.0001186, whisper_loss=0.05905, over 14393.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01061, ecapa_loss=0.0001359, whisper_loss=0.08851, over 3793527.52 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:05:21,647 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5132750.0, ans=0.125 2024-08-21 07:05:32,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=5132850.0, ans=0.2 2024-08-21 07:05:34,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=5132850.0, ans=22.5 2024-08-21 07:05:38,555 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-08-21 07:06:03,221 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-08-21 07:06:25,237 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5133050.0, ans=0.0 2024-08-21 07:06:27,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5133150.0, ans=0.2 2024-08-21 07:06:34,115 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 07:06:39,981 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5133150.0, ans=0.125 2024-08-21 07:06:47,321 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9550, loss[loss=0.07475, beats_loss=0.01215, ecapa_loss=0.0001756, whisper_loss=0.06084, over 15079.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001361, whisper_loss=0.08969, over 3785109.02 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:06:52,319 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5133250.0, ans=0.125 2024-08-21 07:07:05,311 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5133350.0, ans=0.125 2024-08-21 07:07:23,866 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5133450.0, ans=0.0 2024-08-21 07:07:41,253 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 07:07:49,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.309e+01 2.529e+01 2.824e+01 3.800e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 07:08:18,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5133750.0, ans=0.1 2024-08-21 07:08:19,676 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9600, loss[loss=0.09634, beats_loss=0.01049, ecapa_loss=0.0001567, whisper_loss=0.08428, over 21321.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.000137, whisper_loss=0.08949, over 3795153.08 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:08:26,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-21 07:08:31,879 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5133750.0, ans=0.1 2024-08-21 07:09:07,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5133950.0, ans=0.125 2024-08-21 07:09:09,945 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 24 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-21 07:09:26,812 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5134050.0, ans=0.0 2024-08-21 07:09:42,673 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.09 vs. limit=5.0 2024-08-21 07:09:48,710 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9650, loss[loss=0.09043, beats_loss=0.01126, ecapa_loss=0.0001291, whisper_loss=0.07788, over 19874.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001374, whisper_loss=0.08927, over 3769330.74 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:09:52,989 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5134250.0, ans=0.125 2024-08-21 07:10:08,063 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-08-21 07:10:16,133 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 07:10:36,253 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5134450.0, ans=0.125 2024-08-21 07:10:49,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.246e+01 2.506e+01 2.851e+01 2.599e+02, threshold=5.012e+01, percent-clipped=4.0 2024-08-21 07:10:52,186 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 25 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-21 07:11:19,415 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9700, loss[loss=0.08391, beats_loss=0.01068, ecapa_loss=0.0001379, whisper_loss=0.07184, over 15387.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001373, whisper_loss=0.08943, over 3799059.27 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:11:58,606 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5134950.0, ans=0.125 2024-08-21 07:12:19,249 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 25 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 07:12:50,846 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9750, loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001327, whisper_loss=0.0907, over 16300.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.000137, whisper_loss=0.08905, over 3819390.53 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:13:22,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-21 07:13:38,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5135450.0, ans=0.07 2024-08-21 07:13:52,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.258e+01 2.458e+01 2.685e+01 1.396e+02, threshold=4.917e+01, percent-clipped=1.0 2024-08-21 07:14:02,249 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=5135650.0, ans=10.0 2024-08-21 07:14:09,857 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5135650.0, ans=0.0 2024-08-21 07:14:17,067 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-08-21 07:14:20,949 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9800, loss[loss=0.09059, beats_loss=0.0135, ecapa_loss=0.0001002, whisper_loss=0.07608, over 23462.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001367, whisper_loss=0.08938, over 3833969.42 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:14:42,684 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 07:14:43,163 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=22.5 2024-08-21 07:14:52,686 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-21 07:15:15,234 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 07:15:41,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5136150.0, ans=0.1 2024-08-21 07:15:43,725 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 07:15:47,718 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2024-08-21 07:15:54,755 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9850, loss[loss=0.09623, beats_loss=0.01197, ecapa_loss=0.000131, whisper_loss=0.08295, over 15587.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001372, whisper_loss=0.08959, over 3828024.99 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:16:16,240 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5136350.0, ans=0.2 2024-08-21 07:16:53,760 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 07:17:00,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.250e+01 2.454e+01 2.726e+01 7.431e+01, threshold=4.908e+01, percent-clipped=3.0 2024-08-21 07:17:33,744 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9900, loss[loss=0.1041, beats_loss=0.01017, ecapa_loss=0.0001266, whisper_loss=0.09269, over 17608.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001375, whisper_loss=0.08945, over 3803437.28 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:17:38,705 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2024-08-21 07:17:52,530 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 20 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 07:18:09,851 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-21 07:18:29,150 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 07:18:32,128 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5137050.0, ans=0.125 2024-08-21 07:18:49,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5137150.0, ans=0.1 2024-08-21 07:19:07,056 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5137250.0, ans=0.0 2024-08-21 07:19:07,786 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 9950, loss[loss=0.07776, beats_loss=0.01095, ecapa_loss=0.0001641, whisper_loss=0.06517, over 19375.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001375, whisper_loss=0.08878, over 3787638.08 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:19:43,888 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5137450.0, ans=0.0 2024-08-21 07:20:04,747 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5137550.0, ans=0.1 2024-08-21 07:20:09,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-21 07:20:09,790 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 19 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-21 07:20:11,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.331e+01 2.493e+01 2.737e+01 3.742e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-21 07:20:40,311 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10000, loss[loss=0.1116, beats_loss=0.0103, ecapa_loss=0.0001203, whisper_loss=0.1001, over 17717.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01047, ecapa_loss=0.0001381, whisper_loss=0.08887, over 3774160.60 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:20:41,762 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-21 07:20:45,618 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 07:20:51,491 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 07:21:04,441 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5137850.0, ans=0.5 2024-08-21 07:21:14,807 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5137850.0, ans=0.0 2024-08-21 07:21:21,244 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 07:21:24,479 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 07:21:25,261 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5137950.0, ans=0.0 2024-08-21 07:21:25,466 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-21 07:21:28,378 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5137950.0, ans=0.0 2024-08-21 07:21:30,267 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5137950.0, ans=0.0 2024-08-21 07:21:36,392 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:21:37,549 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 07:21:41,353 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 07:22:04,341 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 07:22:14,654 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10050, loss[loss=0.1072, beats_loss=0.009533, ecapa_loss=0.0001204, whisper_loss=0.09651, over 21182.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08911, over 3775944.96 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:22:42,018 WARNING [optim.py:496] (1/4) Scaling gradients by 0.01775754615664482, model_norm_threshold=49.858680725097656 2024-08-21 07:22:42,176 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.817e+06, grad_sumsq=1.684e+08, orig_rms_sq=1.079e-02 2024-08-21 07:22:45,277 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5138350.0, ans=0.2 2024-08-21 07:22:53,278 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5138350.0, ans=0.125 2024-08-21 07:23:22,859 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 07:23:27,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.254e+01 2.564e+01 3.028e+01 2.808e+03, threshold=5.129e+01, percent-clipped=1.0 2024-08-21 07:23:49,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5138650.0, ans=0.125 2024-08-21 07:24:02,640 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10100, loss[loss=0.1055, beats_loss=0.01056, ecapa_loss=0.0001364, whisper_loss=0.09356, over 19318.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001384, whisper_loss=0.08959, over 3815640.04 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:24:13,770 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5138750.0, ans=0.125 2024-08-21 07:24:21,031 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 07:24:26,356 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-21 07:24:32,255 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5138850.0, ans=0.125 2024-08-21 07:24:56,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5138950.0, ans=0.2 2024-08-21 07:25:09,743 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5139050.0, ans=0.0 2024-08-21 07:25:11,716 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5139050.0, ans=0.125 2024-08-21 07:25:36,844 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10150, loss[loss=0.1063, beats_loss=0.007697, ecapa_loss=0.0001848, whisper_loss=0.09678, over 19494.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001387, whisper_loss=0.09006, over 3796651.33 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:26:04,668 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-21 07:26:08,408 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5139350.0, ans=0.125 2024-08-21 07:26:14,275 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-21 07:26:25,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5139450.0, ans=0.04949747468305833 2024-08-21 07:26:37,475 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5139550.0, ans=0.125 2024-08-21 07:26:38,024 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2024-08-21 07:26:38,387 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-21 07:26:44,481 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.296e+01 2.505e+01 2.874e+01 3.996e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-21 07:27:15,372 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10200, loss[loss=0.09799, beats_loss=0.008732, ecapa_loss=0.0001696, whisper_loss=0.08756, over 18732.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001381, whisper_loss=0.08974, over 3821550.77 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:27:27,058 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 21 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 07:27:29,826 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5139750.0, ans=0.125 2024-08-21 07:27:53,942 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 07:27:54,789 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5139950.0, ans=0.1 2024-08-21 07:27:55,814 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-21 07:27:58,539 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5139950.0, ans=0.0 2024-08-21 07:28:07,746 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5139950.0, ans=0.0 2024-08-21 07:28:11,995 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2024-08-21 07:28:20,962 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5140050.0, ans=0.1 2024-08-21 07:28:34,988 WARNING [optim.py:496] (1/4) Scaling gradients by 0.040334705263376236, model_norm_threshold=50.09689712524414 2024-08-21 07:28:35,147 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.841e+05, grad_sumsq=1.841e+05, orig_rms_sq=1.000e+00 2024-08-21 07:28:50,992 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10250, loss[loss=0.09798, beats_loss=0.01162, ecapa_loss=0.0001299, whisper_loss=0.08506, over 22156.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001374, whisper_loss=0.08964, over 3841111.53 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:28:53,197 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 22 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-21 07:29:01,573 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-21 07:29:31,952 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 07:29:56,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.288e+01 2.559e+01 2.960e+01 1.242e+03, threshold=5.118e+01, percent-clipped=2.0 2024-08-21 07:30:10,244 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5140650.0, ans=0.125 2024-08-21 07:30:22,699 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 39 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-21 07:30:28,601 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10300, loss[loss=0.09463, beats_loss=0.01335, ecapa_loss=0.0001212, whisper_loss=0.08007, over 21006.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001386, whisper_loss=0.08981, over 3861295.51 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:30:41,795 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5140750.0, ans=0.0 2024-08-21 07:30:45,076 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-21 07:31:35,949 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5140950.0, ans=0.125 2024-08-21 07:32:24,713 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10350, loss[loss=0.1152, beats_loss=0.009451, ecapa_loss=0.0001353, whisper_loss=0.1044, over 22451.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001395, whisper_loss=0.08965, over 3867875.48 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:32:24,904 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 07:32:48,941 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 07:32:56,581 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5141350.0, ans=0.1 2024-08-21 07:32:59,620 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-21 07:33:25,686 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 07:33:33,199 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-21 07:33:34,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5141550.0, ans=0.0 2024-08-21 07:33:34,624 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-21 07:33:35,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.267e+01 2.630e+01 2.969e+01 5.000e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-21 07:33:46,175 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5141650.0, ans=0.0 2024-08-21 07:33:52,207 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.903e+01 2024-08-21 07:33:52,409 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-08-21 07:33:53,087 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 07:33:55,174 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-21 07:34:01,095 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5141650.0, ans=0.1 2024-08-21 07:34:07,587 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-21 07:34:08,027 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10400, loss[loss=0.1109, beats_loss=0.01097, ecapa_loss=0.0001455, whisper_loss=0.09851, over 21492.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.08936, over 3827682.02 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:34:10,288 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 07:34:25,604 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5141750.0, ans=0.2 2024-08-21 07:34:55,780 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 32 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-21 07:34:56,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5141950.0, ans=0.0 2024-08-21 07:35:02,058 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 07:35:03,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5141950.0, ans=0.125 2024-08-21 07:35:06,293 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 07:35:19,157 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 07:35:34,671 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 18 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 07:35:35,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5142150.0, ans=0.125 2024-08-21 07:35:38,980 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 07:35:52,272 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 18 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-21 07:35:54,744 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10450, loss[loss=0.1093, beats_loss=0.006809, ecapa_loss=0.0001626, whisper_loss=0.1009, over 13993.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.0897, over 3795907.13 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:36:12,591 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5142250.0, ans=0.125 2024-08-21 07:36:33,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5142350.0, ans=0.0 2024-08-21 07:36:58,861 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 07:37:18,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.371e+01 2.736e+01 3.043e+01 5.041e+02, threshold=5.472e+01, percent-clipped=3.0 2024-08-21 07:37:53,568 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10500, loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001192, whisper_loss=0.09066, over 21017.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.0001382, whisper_loss=0.08962, over 3818703.24 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:37:56,003 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5142750.0, ans=0.0 2024-08-21 07:38:00,440 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5142750.0, ans=0.025 2024-08-21 07:38:11,740 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5142750.0, ans=0.07 2024-08-21 07:38:12,639 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 07:38:16,610 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5142850.0, ans=0.0 2024-08-21 07:38:23,345 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:38:47,613 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5142950.0, ans=0.125 2024-08-21 07:39:24,871 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 07:39:32,733 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 07:39:41,918 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10550, loss[loss=0.0968, beats_loss=0.01141, ecapa_loss=0.0001054, whisper_loss=0.08434, over 19793.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001391, whisper_loss=0.09029, over 3846785.95 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:39:44,913 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5143250.0, ans=0.1 2024-08-21 07:39:49,306 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5143250.0, ans=0.0 2024-08-21 07:40:13,763 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5143350.0, ans=0.1 2024-08-21 07:40:50,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.474e+01 2.751e+01 3.009e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-21 07:41:18,594 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5143650.0, ans=0.125 2024-08-21 07:41:21,240 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=15.0 2024-08-21 07:41:21,518 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10600, loss[loss=0.08697, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.07472, over 20336.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01026, ecapa_loss=0.0001387, whisper_loss=0.08988, over 3803107.84 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:41:24,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5143750.0, ans=0.125 2024-08-21 07:41:27,163 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-21 07:41:29,059 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 07:41:38,976 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5143850.0, ans=0.1 2024-08-21 07:42:01,476 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2024-08-21 07:42:15,055 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5144050.0, ans=0.125 2024-08-21 07:42:18,674 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5144050.0, ans=0.1 2024-08-21 07:42:30,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=5144050.0, ans=0.025 2024-08-21 07:42:55,659 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10650, loss[loss=0.1087, beats_loss=0.01122, ecapa_loss=0.000113, whisper_loss=0.09631, over 22959.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01024, ecapa_loss=0.0001383, whisper_loss=0.09005, over 3768397.81 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:43:43,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-08-21 07:44:00,394 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 07:44:04,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.276e+01 2.540e+01 2.903e+01 1.576e+02, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 07:44:16,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2024-08-21 07:44:33,452 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10700, loss[loss=0.1001, beats_loss=0.01002, ecapa_loss=0.0001545, whisper_loss=0.0885, over 22598.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01035, ecapa_loss=0.0001369, whisper_loss=0.08915, over 3728944.40 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:44:39,635 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 07:44:41,133 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-08-21 07:45:23,340 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-21 07:45:24,121 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5144950.0, ans=0.0 2024-08-21 07:45:32,451 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 07:45:33,179 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5145050.0, ans=0.0 2024-08-21 07:45:49,506 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2024-08-21 07:46:10,560 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10750, loss[loss=0.09883, beats_loss=0.007377, ecapa_loss=0.0001514, whisper_loss=0.08994, over 14739.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01041, ecapa_loss=0.0001357, whisper_loss=0.08856, over 3744432.09 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:46:19,171 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2024-08-21 07:46:46,860 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 19 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-21 07:46:52,069 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5145450.0, ans=0.125 2024-08-21 07:47:13,190 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-21 07:47:16,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.270e+01 2.527e+01 2.757e+01 4.165e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-21 07:47:23,862 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.325e-01 2024-08-21 07:47:26,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5145550.0, ans=0.1 2024-08-21 07:47:28,575 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-21 07:47:42,894 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-21 07:47:47,530 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10800, loss[loss=0.07685, beats_loss=0.01145, ecapa_loss=0.0001227, whisper_loss=0.06417, over 17979.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0105, ecapa_loss=0.0001358, whisper_loss=0.08878, over 3783885.74 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:47:56,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5145750.0, ans=0.125 2024-08-21 07:48:10,697 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 07:49:09,862 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.75 vs. limit=6.0 2024-08-21 07:49:20,938 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10850, loss[loss=0.09388, beats_loss=0.01103, ecapa_loss=0.0001264, whisper_loss=0.08159, over 17239.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001354, whisper_loss=0.08916, over 3795855.74 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:49:24,290 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-21 07:49:26,661 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 07:49:43,036 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.377e+01 2024-08-21 07:49:46,793 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5146350.0, ans=0.125 2024-08-21 07:49:48,111 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-21 07:49:53,890 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5146350.0, ans=0.125 2024-08-21 07:50:12,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5146450.0, ans=0.0 2024-08-21 07:50:23,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.344e+01 2.543e+01 2.878e+01 8.431e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 07:50:26,561 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5146550.0, ans=0.0 2024-08-21 07:50:38,824 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 07:50:41,515 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2024-08-21 07:50:47,210 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-21 07:50:49,462 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 07:50:52,685 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10900, loss[loss=0.1036, beats_loss=0.01138, ecapa_loss=0.0001218, whisper_loss=0.09105, over 14813.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001356, whisper_loss=0.08955, over 3798062.43 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:50:54,815 INFO [train_multi_KD3.py:845] (1/4) A total of 80 cuts. 19 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 07:51:08,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5146750.0, ans=0.1 2024-08-21 07:51:43,751 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 07:51:50,305 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5147050.0, ans=0.125 2024-08-21 07:52:20,492 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5147150.0, ans=0.0 2024-08-21 07:52:23,181 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 10950, loss[loss=0.1143, beats_loss=0.008822, ecapa_loss=0.0001487, whisper_loss=0.104, over 14801.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001357, whisper_loss=0.08967, over 3776234.58 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:52:32,224 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 07:53:01,572 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 07:53:10,057 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.77 vs. limit=22.5 2024-08-21 07:53:16,526 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5147550.0, ans=0.0 2024-08-21 07:53:22,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.337e+01 2.519e+01 2.828e+01 1.066e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-21 07:53:22,796 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 07:53:26,470 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-21 07:53:34,865 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 07:53:47,077 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-21 07:53:48,159 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5147650.0, ans=0.0 2024-08-21 07:53:52,846 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11000, loss[loss=0.09224, beats_loss=0.01306, ecapa_loss=0.0001474, whisper_loss=0.0777, over 16967.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001358, whisper_loss=0.09024, over 3825252.02 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:53:55,888 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=12.0 2024-08-21 07:53:58,065 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-21 07:54:03,128 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2024-08-21 07:54:21,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5147850.0, ans=0.0 2024-08-21 07:54:22,599 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5147850.0, ans=0.1 2024-08-21 07:54:30,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5147950.0, ans=0.035 2024-08-21 07:55:21,760 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11050, loss[loss=0.1041, beats_loss=0.0122, ecapa_loss=0.0001328, whisper_loss=0.09062, over 22907.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001373, whisper_loss=0.09085, over 3824483.93 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:55:22,235 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5148250.0, ans=0.2 2024-08-21 07:55:27,664 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2024-08-21 07:55:42,468 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5148350.0, ans=0.125 2024-08-21 07:55:44,115 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5148350.0, ans=0.04949747468305833 2024-08-21 07:55:47,198 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5148350.0, ans=0.125 2024-08-21 07:55:47,440 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2024-08-21 07:55:55,527 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-21 07:56:02,545 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 07:56:19,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.275e+01 2.484e+01 2.790e+01 7.658e+01, threshold=4.968e+01, percent-clipped=1.0 2024-08-21 07:56:47,096 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 07:56:48,753 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11100, loss[loss=0.0976, beats_loss=0.0107, ecapa_loss=0.0001439, whisper_loss=0.08547, over 22142.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001362, whisper_loss=0.0903, over 3818738.90 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:56:54,896 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5148750.0, ans=0.0 2024-08-21 07:56:54,931 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.825e-01 2024-08-21 07:57:07,386 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5148850.0, ans=0.0 2024-08-21 07:57:07,554 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=5148850.0, ans=15.0 2024-08-21 07:57:15,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5148850.0, ans=0.125 2024-08-21 07:57:26,600 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=5148950.0, ans=10.0 2024-08-21 07:57:35,455 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 14 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 07:57:56,021 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 07:58:17,162 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 07:58:18,679 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11150, loss[loss=0.101, beats_loss=0.01071, ecapa_loss=0.0001446, whisper_loss=0.08884, over 21026.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001361, whisper_loss=0.09009, over 3839991.68 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:58:19,399 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5149250.0, ans=0.1 2024-08-21 07:58:20,444 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 20 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-21 07:58:24,563 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5149250.0, ans=0.125 2024-08-21 07:58:44,589 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-21 07:59:06,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5149450.0, ans=0.1 2024-08-21 07:59:09,462 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5149550.0, ans=0.1 2024-08-21 07:59:17,844 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.340e+01 2.550e+01 2.884e+01 1.372e+02, threshold=5.100e+01, percent-clipped=2.0 2024-08-21 07:59:23,893 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5149550.0, ans=0.2 2024-08-21 07:59:34,279 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5149650.0, ans=0.125 2024-08-21 07:59:46,175 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11200, loss[loss=0.1004, beats_loss=0.01076, ecapa_loss=0.0001337, whisper_loss=0.08833, over 22423.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001372, whisper_loss=0.09078, over 3857669.36 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:00:05,229 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:00:13,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5149850.0, ans=0.2 2024-08-21 08:00:58,166 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-21 08:01:03,962 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 32 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-21 08:01:08,370 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 08:01:11,926 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 08:01:20,658 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5150150.0, ans=0.125 2024-08-21 08:01:29,574 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-08-21 08:01:29,906 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11250, loss[loss=0.09493, beats_loss=0.01021, ecapa_loss=0.0001394, whisper_loss=0.08332, over 21228.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0104, ecapa_loss=0.0001382, whisper_loss=0.09161, over 3899584.19 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:01:55,584 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5150350.0, ans=0.07 2024-08-21 08:01:57,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-21 08:02:03,809 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.18 vs. limit=15.0 2024-08-21 08:02:17,781 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 08:02:32,129 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5150550.0, ans=0.125 2024-08-21 08:02:42,095 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.342e+01 2.612e+01 2.998e+01 2.607e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-21 08:02:49,393 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5150550.0, ans=0.125 2024-08-21 08:02:51,626 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5150550.0, ans=0.1 2024-08-21 08:02:57,280 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 17 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-21 08:03:12,062 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5150650.0, ans=0.0 2024-08-21 08:03:15,124 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11300, loss[loss=0.07746, beats_loss=0.01352, ecapa_loss=0.0001153, whisper_loss=0.06279, over 23092.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001388, whisper_loss=0.09074, over 3859959.21 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:03:37,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5150850.0, ans=0.125 2024-08-21 08:03:53,148 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5150850.0, ans=0.125 2024-08-21 08:04:02,696 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5150950.0, ans=0.125 2024-08-21 08:04:06,762 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5150950.0, ans=0.125 2024-08-21 08:04:19,539 INFO [train_multi_KD3.py:845] (1/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 08:04:29,601 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-21 08:04:44,300 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-21 08:04:47,634 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 14 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-21 08:04:51,255 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 21 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 08:04:54,570 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11350, loss[loss=0.1194, beats_loss=0.00497, ecapa_loss=0.0002325, whisper_loss=0.1121, over 13994.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.09066, over 3847262.61 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:05:11,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5151350.0, ans=0.125 2024-08-21 08:05:13,989 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 36 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-21 08:05:31,774 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-21 08:05:41,927 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5151450.0, ans=0.125 2024-08-21 08:05:54,818 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5151550.0, ans=0.125 2024-08-21 08:05:54,828 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5151550.0, ans=0.125 2024-08-21 08:05:57,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.232e+01 2.527e+01 2.803e+01 3.759e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-21 08:06:01,941 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5151550.0, ans=0.2 2024-08-21 08:06:03,022 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 08:06:26,840 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11400, loss[loss=0.0891, beats_loss=0.01266, ecapa_loss=0.0001181, whisper_loss=0.07526, over 22518.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.09041, over 3842488.82 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:06:38,303 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5151750.0, ans=0.0 2024-08-21 08:06:41,054 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 08:06:42,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5151750.0, ans=0.1 2024-08-21 08:06:45,813 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5151850.0, ans=0.125 2024-08-21 08:06:49,656 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5151850.0, ans=0.125 2024-08-21 08:06:51,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5151850.0, ans=0.025 2024-08-21 08:07:28,281 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2024-08-21 08:07:34,219 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-21 08:07:37,156 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 08:07:43,293 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5152050.0, ans=0.1 2024-08-21 08:08:03,739 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-21 08:08:03,750 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=5152150.0, ans=10.0 2024-08-21 08:08:06,647 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11450, loss[loss=0.1307, beats_loss=0.008275, ecapa_loss=0.0001455, whisper_loss=0.121, over 21579.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001373, whisper_loss=0.0903, over 3825455.14 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:08:17,206 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 12 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 08:08:18,815 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5152250.0, ans=0.125 2024-08-21 08:08:25,156 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-21 08:08:32,625 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2024-08-21 08:08:38,108 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=5152350.0, ans=0.02 2024-08-21 08:09:07,716 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-21 08:09:09,906 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=22.5 2024-08-21 08:09:13,688 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5152550.0, ans=0.0 2024-08-21 08:09:14,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.353e+01 2.552e+01 2.800e+01 3.552e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-21 08:09:16,599 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 27 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-21 08:09:22,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5152550.0, ans=0.0 2024-08-21 08:09:30,288 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-21 08:09:44,849 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 18 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-21 08:09:46,856 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11500, loss[loss=0.109, beats_loss=0.008909, ecapa_loss=0.0001713, whisper_loss=0.0984, over 12983.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001372, whisper_loss=0.09051, over 3838500.59 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:09:52,916 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 19 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 08:10:13,060 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5152850.0, ans=0.125 2024-08-21 08:10:13,535 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2024-08-21 08:10:46,756 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-21 08:11:17,009 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5153150.0, ans=0.125 2024-08-21 08:11:17,985 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 08:11:23,282 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11550, loss[loss=0.1136, beats_loss=0.008243, ecapa_loss=0.0001414, whisper_loss=0.104, over 18828.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01029, ecapa_loss=0.0001366, whisper_loss=0.09097, over 3846467.82 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:11:24,507 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-21 08:11:49,332 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 08:11:50,950 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 08:11:52,390 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5153350.0, ans=0.125 2024-08-21 08:12:27,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.384e+01 2.690e+01 2.968e+01 5.018e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-21 08:12:30,461 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5153550.0, ans=0.0 2024-08-21 08:12:34,020 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5153550.0, ans=0.04949747468305833 2024-08-21 08:12:47,387 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5153650.0, ans=0.125 2024-08-21 08:12:51,392 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5153650.0, ans=0.125 2024-08-21 08:12:55,008 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2024-08-21 08:12:57,140 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11600, loss[loss=0.07644, beats_loss=0.01285, ecapa_loss=9.37e-05, whisper_loss=0.06265, over 14615.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01024, ecapa_loss=0.0001376, whisper_loss=0.09079, over 3794479.19 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:13:05,458 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-08-21 08:13:35,213 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5153950.0, ans=0.2 2024-08-21 08:13:37,501 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.17 vs. limit=22.5 2024-08-21 08:13:42,575 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5153950.0, ans=0.0 2024-08-21 08:13:46,368 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5153950.0, ans=0.125 2024-08-21 08:13:52,556 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5153950.0, ans=0.125 2024-08-21 08:13:54,591 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:14:33,694 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 08:14:34,675 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5154250.0, ans=0.125 2024-08-21 08:14:35,496 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11650, loss[loss=0.1003, beats_loss=0.009801, ecapa_loss=0.0001561, whisper_loss=0.08896, over 19820.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01023, ecapa_loss=0.000137, whisper_loss=0.09172, over 3818904.98 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:14:47,954 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 21 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-21 08:15:06,145 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5154350.0, ans=0.0 2024-08-21 08:15:06,933 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-21 08:15:08,207 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5154350.0, ans=0.0 2024-08-21 08:15:12,989 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 08:15:14,117 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5154450.0, ans=0.0 2024-08-21 08:15:20,978 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5154450.0, ans=0.2 2024-08-21 08:15:22,948 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.95 vs. limit=6.0 2024-08-21 08:15:34,868 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 08:15:40,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.291e+01 2.549e+01 2.927e+01 7.915e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-21 08:15:53,204 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5154650.0, ans=0.125 2024-08-21 08:15:57,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5154650.0, ans=0.125 2024-08-21 08:16:11,410 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11700, loss[loss=0.1194, beats_loss=0.008371, ecapa_loss=0.000145, whisper_loss=0.1096, over 19579.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.0915, over 3851872.41 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:16:12,612 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-08-21 08:16:33,480 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2024-08-21 08:16:42,886 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-21 08:17:02,223 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 22 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-21 08:17:34,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5155150.0, ans=0.04949747468305833 2024-08-21 08:17:36,191 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5155150.0, ans=10.0 2024-08-21 08:17:39,777 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 08:17:41,257 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11750, loss[loss=0.09284, beats_loss=0.01048, ecapa_loss=0.0001132, whisper_loss=0.08123, over 21502.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.000138, whisper_loss=0.09099, over 3853253.44 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:17:46,389 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5155250.0, ans=0.1 2024-08-21 08:17:56,574 INFO [train_multi_KD3.py:845] (1/4) A total of 82 cuts. 30 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-21 08:17:56,980 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-21 08:18:38,314 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 29 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 08:18:41,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.496e+01 2.849e+01 3.219e+01 3.241e+02, threshold=5.697e+01, percent-clipped=3.0 2024-08-21 08:18:49,365 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5155650.0, ans=0.0 2024-08-21 08:19:03,297 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5155650.0, ans=0.1 2024-08-21 08:19:07,362 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11800, loss[loss=0.08557, beats_loss=0.01179, ecapa_loss=0.0001154, whisper_loss=0.07262, over 15039.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.0001382, whisper_loss=0.09063, over 3847194.73 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:19:08,268 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5155750.0, ans=0.07 2024-08-21 08:19:12,648 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5155750.0, ans=0.125 2024-08-21 08:19:25,875 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 08:20:04,785 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.67 vs. limit=10.0 2024-08-21 08:20:05,472 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 08:20:19,333 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5156050.0, ans=0.125 2024-08-21 08:20:28,782 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-21 08:20:57,602 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11850, loss[loss=0.09703, beats_loss=0.01077, ecapa_loss=0.0001695, whisper_loss=0.08457, over 21490.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001383, whisper_loss=0.09018, over 3833182.18 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:20:57,813 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 08:21:08,415 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2024-08-21 08:21:11,322 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-21 08:22:03,021 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-08-21 08:22:11,483 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 08:22:13,737 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 32 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 08:22:17,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.264e+01 2.462e+01 2.768e+01 4.199e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-21 08:22:34,467 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5156650.0, ans=0.1 2024-08-21 08:22:48,152 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11900, loss[loss=0.1125, beats_loss=0.01112, ecapa_loss=0.00011, whisper_loss=0.1003, over 23141.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001372, whisper_loss=0.08967, over 3853706.36 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:23:00,620 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5156750.0, ans=0.1 2024-08-21 08:23:18,790 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5156850.0, ans=0.1 2024-08-21 08:23:24,144 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5156850.0, ans=0.125 2024-08-21 08:23:41,414 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 08:24:05,734 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5157050.0, ans=0.125 2024-08-21 08:24:09,443 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5157150.0, ans=0.1 2024-08-21 08:24:32,311 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 11950, loss[loss=0.07802, beats_loss=0.01378, ecapa_loss=0.0001214, whisper_loss=0.06302, over 20777.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001371, whisper_loss=0.08971, over 3844471.67 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:24:32,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5157250.0, ans=0.0 2024-08-21 08:24:36,157 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-21 08:24:54,221 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5157350.0, ans=0.125 2024-08-21 08:24:54,583 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-08-21 08:25:11,032 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 08:25:16,479 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5157450.0, ans=0.1 2024-08-21 08:25:29,779 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5157450.0, ans=0.125 2024-08-21 08:25:50,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.272e+01 2.506e+01 2.845e+01 4.517e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-21 08:26:27,510 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12000, loss[loss=0.08036, beats_loss=0.0128, ecapa_loss=0.0001507, whisper_loss=0.06605, over 16865.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001376, whisper_loss=0.08957, over 3850244.71 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:26:27,511 INFO [train_multi_KD3.py:1140] (1/4) Computing validation loss 2024-08-21 08:27:05,337 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on ASR_libri: loss=0.2549, beats_loss=0, ecapa_loss=0.0005016, whisper_loss=0.2499, over 931116.00 frames. 2024-08-21 08:27:31,410 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on SV_voxceleb1: loss=0.00396, beats_loss=0, ecapa_loss=0.000396, whisper_loss=0, over 944235.00 frames. 2024-08-21 08:27:40,473 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8677, 2.0355, 2.3492, 2.2103], device='cuda:1') 2024-08-21 08:28:38,427 INFO [zipformer.py:1877] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5713, 2.8810, 2.7820, 2.8064], device='cuda:1') 2024-08-21 08:29:17,059 INFO [train_multi_KD3.py:1150] (1/4) Epoch 35, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 08:29:17,068 INFO [train_multi_KD3.py:1156] (1/4) Maximum memory allocated so far is 30838MB 2024-08-21 08:29:18,123 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-21 08:29:45,384 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-21 08:29:49,563 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-21 08:29:58,201 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2024-08-21 08:29:59,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5157950.0, ans=0.125 2024-08-21 08:30:06,182 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 08:30:25,169 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 08:30:49,450 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12050, loss[loss=0.09907, beats_loss=0.0109, ecapa_loss=0.0001517, whisper_loss=0.08666, over 21405.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001374, whisper_loss=0.09029, over 3842270.81 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:30:53,737 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5158250.0, ans=0.125 2024-08-21 08:30:57,345 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5158250.0, ans=0.025 2024-08-21 08:31:22,895 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-21 08:31:25,028 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2024-08-21 08:31:40,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5158450.0, ans=0.07 2024-08-21 08:31:56,543 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.247e+01 2.409e+01 2.694e+01 3.930e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-21 08:32:02,541 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5158550.0, ans=0.1 2024-08-21 08:32:03,952 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5158550.0, ans=0.0 2024-08-21 08:32:19,607 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 08:32:21,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-08-21 08:32:31,273 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12100, loss[loss=0.08736, beats_loss=0.01049, ecapa_loss=0.0001598, whisper_loss=0.07527, over 19298.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001384, whisper_loss=0.09017, over 3872502.35 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:32:57,687 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 16 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 08:33:07,458 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:33:54,136 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5159050.0, ans=0.125 2024-08-21 08:34:24,135 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12150, loss[loss=0.08649, beats_loss=0.01143, ecapa_loss=0.0001321, whisper_loss=0.07373, over 22648.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.0909, over 3858349.58 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:34:43,384 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:34:57,379 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5159350.0, ans=0.0 2024-08-21 08:35:28,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.279e+01 2.549e+01 2.837e+01 4.060e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-21 08:35:29,426 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5159550.0, ans=0.125 2024-08-21 08:35:32,889 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5159550.0, ans=0.125 2024-08-21 08:35:33,919 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 08:35:35,578 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 08:35:46,053 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 08:35:52,088 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5159650.0, ans=0.0 2024-08-21 08:35:52,222 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5159650.0, ans=0.125 2024-08-21 08:35:52,226 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5159650.0, ans=0.0 2024-08-21 08:35:54,430 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12200, loss[loss=0.07408, beats_loss=0.01434, ecapa_loss=8.72e-05, whisper_loss=0.05887, over 19287.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001375, whisper_loss=0.09043, over 3850346.62 frames. ], batch size: 75, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:36:00,900 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 08:36:04,568 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 32 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 08:36:17,179 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-21 08:36:28,459 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5159950.0, ans=0.1 2024-08-21 08:36:44,469 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09456772357225418, model_norm_threshold=50.97699737548828 2024-08-21 08:36:44,628 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.739e+04, grad_sumsq=4.739e+04, orig_rms_sq=1.000e+00 2024-08-21 08:36:50,941 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.27 vs. limit=22.5 2024-08-21 08:37:13,454 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 08:37:23,090 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12250, loss[loss=0.09881, beats_loss=0.01044, ecapa_loss=0.000163, whisper_loss=0.08674, over 20879.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001372, whisper_loss=0.0901, over 3835301.13 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:37:30,437 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 08:37:48,786 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5160350.0, ans=0.0 2024-08-21 08:37:52,402 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.955e-01 2024-08-21 08:37:59,503 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 08:38:09,165 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5160450.0, ans=0.0 2024-08-21 08:38:22,001 INFO [train_multi_KD3.py:845] (1/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 08:38:25,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.259e+01 2.495e+01 2.764e+01 5.391e+02, threshold=4.989e+01, percent-clipped=1.0 2024-08-21 08:38:36,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5160650.0, ans=0.125 2024-08-21 08:38:43,605 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 08:38:52,661 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12300, loss[loss=0.07098, beats_loss=0.01261, ecapa_loss=0.0001193, whisper_loss=0.05718, over 21286.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001367, whisper_loss=0.08993, over 3803579.60 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:39:21,351 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-21 08:39:42,109 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.17 vs. limit=22.5 2024-08-21 08:39:56,262 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5161050.0, ans=0.1 2024-08-21 08:40:10,792 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 08:40:20,125 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 29 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 08:40:26,980 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12350, loss[loss=0.09155, beats_loss=0.01039, ecapa_loss=0.0001179, whisper_loss=0.07998, over 22747.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001376, whisper_loss=0.09046, over 3839683.80 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:40:45,538 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 08:41:17,922 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5161450.0, ans=0.0 2024-08-21 08:41:30,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.295e+01 2.536e+01 2.877e+01 1.914e+02, threshold=5.073e+01, percent-clipped=2.0 2024-08-21 08:41:36,020 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-21 08:41:36,972 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5161550.0, ans=0.0 2024-08-21 08:41:49,921 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5161650.0, ans=0.5 2024-08-21 08:41:53,497 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5161650.0, ans=0.0 2024-08-21 08:41:57,739 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12400, loss[loss=0.105, beats_loss=0.009267, ecapa_loss=0.0001528, whisper_loss=0.09422, over 16239.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.000138, whisper_loss=0.09013, over 3831005.27 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:42:07,715 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 08:42:22,210 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 23 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-21 08:42:23,955 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 08:42:25,241 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=12.0 2024-08-21 08:42:29,408 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-21 08:42:30,806 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-21 08:42:54,798 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-08-21 08:43:13,804 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-08-21 08:43:20,038 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 21 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-21 08:43:40,241 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5162150.0, ans=0.1 2024-08-21 08:43:44,338 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12450, loss[loss=0.09392, beats_loss=0.009031, ecapa_loss=0.0001664, whisper_loss=0.08323, over 13839.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.000137, whisper_loss=0.09002, over 3780727.35 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:43:55,699 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5162250.0, ans=0.2 2024-08-21 08:44:41,484 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5162450.0, ans=0.07 2024-08-21 08:44:51,601 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5162550.0, ans=0.1 2024-08-21 08:44:56,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.293e+01 2.503e+01 2.840e+01 4.657e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 08:45:27,617 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12500, loss[loss=0.1061, beats_loss=0.01157, ecapa_loss=0.000136, whisper_loss=0.09316, over 19637.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001365, whisper_loss=0.09001, over 3762016.07 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:45:42,945 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5162750.0, ans=0.125 2024-08-21 08:45:45,467 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2024-08-21 08:45:52,629 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 16 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-21 08:46:09,223 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5162950.0, ans=0.0 2024-08-21 08:46:11,898 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-21 08:46:44,780 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5163050.0, ans=0.125 2024-08-21 08:46:56,239 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5163150.0, ans=0.09899494936611666 2024-08-21 08:47:05,203 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 36 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 08:47:10,763 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12550, loss[loss=0.1063, beats_loss=0.00931, ecapa_loss=0.0001814, whisper_loss=0.09514, over 19685.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001373, whisper_loss=0.08992, over 3793932.19 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:47:16,910 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 08:47:21,930 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5163250.0, ans=0.0 2024-08-21 08:47:31,201 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 08:47:48,512 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 08:47:50,603 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-21 08:47:56,607 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=5163450.0, ans=0.02 2024-08-21 08:48:03,458 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5163450.0, ans=0.125 2024-08-21 08:48:10,981 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-21 08:48:14,355 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 08:48:24,728 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.72 vs. limit=10.0 2024-08-21 08:48:26,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.391e+01 2.623e+01 3.039e+01 4.282e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-21 08:48:33,227 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5163550.0, ans=0.125 2024-08-21 08:48:41,825 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 08:48:43,102 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5163650.0, ans=0.0 2024-08-21 08:48:50,439 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5163650.0, ans=0.0 2024-08-21 08:48:53,023 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 15 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-21 08:48:56,048 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12600, loss[loss=0.09346, beats_loss=0.01173, ecapa_loss=0.0001037, whisper_loss=0.08069, over 23257.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001368, whisper_loss=0.08984, over 3783863.44 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:48:57,372 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5163750.0, ans=0.125 2024-08-21 08:48:59,429 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-21 08:49:07,840 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5163750.0, ans=0.0 2024-08-21 08:49:21,897 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 08:49:26,082 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5163850.0, ans=0.0 2024-08-21 08:49:33,766 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5163950.0, ans=0.0 2024-08-21 08:49:49,327 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 08:49:51,835 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5164050.0, ans=0.2 2024-08-21 08:50:27,507 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12650, loss[loss=0.1095, beats_loss=0.009356, ecapa_loss=0.0001437, whisper_loss=0.09867, over 22416.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0103, ecapa_loss=0.0001371, whisper_loss=0.09102, over 3823840.51 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:50:30,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2024-08-21 08:50:32,844 INFO [train_multi_KD3.py:845] (1/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 08:50:34,724 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 08:50:36,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5164250.0, ans=0.125 2024-08-21 08:50:40,912 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-21 08:50:53,483 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 08:50:58,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5164350.0, ans=0.1 2024-08-21 08:51:08,711 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2024-08-21 08:51:30,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.313e+01 2.541e+01 2.786e+01 6.490e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-21 08:51:42,957 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5164650.0, ans=0.0 2024-08-21 08:51:45,957 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 08:51:58,192 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12700, loss[loss=0.1104, beats_loss=0.009211, ecapa_loss=0.0001184, whisper_loss=0.1, over 20409.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.0001378, whisper_loss=0.0906, over 3842114.57 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:52:14,829 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5164750.0, ans=0.125 2024-08-21 08:52:27,906 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5164850.0, ans=0.2 2024-08-21 08:52:47,184 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-21 08:52:59,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5164950.0, ans=0.1 2024-08-21 08:53:25,019 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 25 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-21 08:53:26,684 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5165150.0, ans=0.09899494936611666 2024-08-21 08:53:33,827 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0630233883857727, model_norm_threshold=50.820472717285156 2024-08-21 08:53:33,987 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.448e+05, grad_sumsq=2.272e+07, orig_rms_sq=1.077e-02 2024-08-21 08:53:48,956 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12750, loss[loss=0.09682, beats_loss=0.01066, ecapa_loss=0.0001348, whisper_loss=0.08481, over 17289.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001357, whisper_loss=0.09102, over 3814991.36 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:54:00,147 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 08:54:10,222 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.03 vs. limit=15.0 2024-08-21 08:54:32,139 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=22.5 2024-08-21 08:54:58,465 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=12.0 2024-08-21 08:55:07,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.308e+01 2.534e+01 2.848e+01 8.064e+02, threshold=5.067e+01, percent-clipped=1.0 2024-08-21 08:55:16,825 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 08:55:27,494 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.177e+01 2024-08-21 08:55:43,311 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12800, loss[loss=0.1155, beats_loss=0.01134, ecapa_loss=0.0001371, whisper_loss=0.1028, over 22753.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001373, whisper_loss=0.09127, over 3820735.37 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:55:44,464 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5165750.0, ans=0.2 2024-08-21 08:56:11,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5165850.0, ans=0.0 2024-08-21 08:56:33,061 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-21 08:56:39,454 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5165950.0, ans=0.125 2024-08-21 08:56:48,891 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5165950.0, ans=0.125 2024-08-21 08:56:48,911 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5165950.0, ans=0.125 2024-08-21 08:57:19,905 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5166150.0, ans=0.0 2024-08-21 08:57:23,554 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=22.5 2024-08-21 08:57:35,501 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5166250.0, ans=0.0 2024-08-21 08:57:36,101 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12850, loss[loss=0.07338, beats_loss=0.01169, ecapa_loss=0.0001578, whisper_loss=0.06011, over 15027.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001379, whisper_loss=0.09138, over 3820387.99 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:57:51,411 INFO [train_multi_KD3.py:845] (1/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 08:58:10,614 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5166350.0, ans=0.0 2024-08-21 08:58:26,820 INFO [train_multi_KD3.py:845] (1/4) A total of 54 cuts. 23 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-21 08:58:58,840 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 13 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 08:59:02,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.226e+01 2.436e+01 2.740e+01 3.525e+01, threshold=4.872e+01, percent-clipped=0.0 2024-08-21 08:59:18,344 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-21 08:59:36,013 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12900, loss[loss=0.1044, beats_loss=0.007947, ecapa_loss=0.0001528, whisper_loss=0.09493, over 14544.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001388, whisper_loss=0.09084, over 3782457.23 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:00:19,412 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 09:00:36,147 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 20 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-21 09:00:52,950 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5167050.0, ans=0.125 2024-08-21 09:00:54,864 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-08-21 09:01:36,899 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5167150.0, ans=0.125 2024-08-21 09:01:36,918 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5167150.0, ans=0.1 2024-08-21 09:01:41,506 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 12950, loss[loss=0.1007, beats_loss=0.01344, ecapa_loss=9.524e-05, whisper_loss=0.08629, over 23786.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001383, whisper_loss=0.09021, over 3797379.33 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:02:18,318 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5167350.0, ans=0.125 2024-08-21 09:02:39,960 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5167450.0, ans=0.0 2024-08-21 09:02:53,334 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5167450.0, ans=0.125 2024-08-21 09:03:14,837 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.293e+01 2.527e+01 2.916e+01 2.821e+02, threshold=5.054e+01, percent-clipped=3.0 2024-08-21 09:03:15,097 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-21 09:03:47,927 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 09:03:52,025 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-08-21 09:03:53,536 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13000, loss[loss=0.05594, beats_loss=0.01558, ecapa_loss=0.0001114, whisper_loss=0.03924, over 12943.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001385, whisper_loss=0.09046, over 3812739.50 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:04:17,856 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5167850.0, ans=0.2 2024-08-21 09:04:39,089 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5167850.0, ans=0.125 2024-08-21 09:05:08,361 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5168050.0, ans=0.125 2024-08-21 09:05:12,335 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.145e+05 2024-08-21 09:05:18,316 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5168050.0, ans=0.125 2024-08-21 09:05:30,956 INFO [scaling.py:1024] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=8.0 2024-08-21 09:05:47,082 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13050, loss[loss=0.1172, beats_loss=0.00737, ecapa_loss=0.0001608, whisper_loss=0.1082, over 17670.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001374, whisper_loss=0.08983, over 3848039.77 frames. ], batch size: 71, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:05:55,317 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-21 09:05:55,884 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-21 09:06:15,178 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=5168350.0, ans=22.5 2024-08-21 09:06:36,632 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 30 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-21 09:06:53,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.244e+01 2.565e+01 2.822e+01 8.760e+01, threshold=5.130e+01, percent-clipped=2.0 2024-08-21 09:07:10,138 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 18 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 09:07:21,780 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 09:07:26,635 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13100, loss[loss=0.09165, beats_loss=0.009683, ecapa_loss=0.0001522, whisper_loss=0.08044, over 16566.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001387, whisper_loss=0.08998, over 3798246.21 frames. ], batch size: 69, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:07:56,042 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5168850.0, ans=0.0 2024-08-21 09:08:03,205 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5168850.0, ans=0.1 2024-08-21 09:08:16,771 INFO [train_multi_KD3.py:845] (1/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 09:08:23,085 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5168950.0, ans=0.125 2024-08-21 09:08:25,090 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-21 09:08:27,968 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5168950.0, ans=0.2 2024-08-21 09:08:39,667 INFO [train_multi_KD3.py:845] (1/4) A total of 95 cuts. 24 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-21 09:08:42,967 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5169050.0, ans=0.125 2024-08-21 09:08:55,874 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5169050.0, ans=0.125 2024-08-21 09:08:58,777 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-21 09:09:14,360 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-21 09:09:31,225 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13150, loss[loss=0.07879, beats_loss=0.01373, ecapa_loss=0.000107, whisper_loss=0.064, over 22799.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001391, whisper_loss=0.08882, over 3769170.80 frames. ], batch size: 94, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:09:36,795 WARNING [optim.py:496] (1/4) Scaling gradients by 0.0913950502872467, model_norm_threshold=51.30171203613281 2024-08-21 09:09:36,952 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.527e+05, grad_sumsq=4.631e+04, orig_rms_sq=3.298e+00 2024-08-21 09:09:50,703 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5169250.0, ans=0.125 2024-08-21 09:09:58,025 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5169350.0, ans=0.1 2024-08-21 09:10:02,161 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5169350.0, ans=0.1 2024-08-21 09:10:17,661 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 22 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-21 09:10:22,705 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5169450.0, ans=0.125 2024-08-21 09:10:32,706 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-21 09:10:32,706 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=5169450.0, ans=15.0 2024-08-21 09:10:48,029 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 09:10:59,217 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.235e+01 2.478e+01 2.768e+01 5.613e+02, threshold=4.956e+01, percent-clipped=2.0 2024-08-21 09:11:07,605 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-21 09:11:37,586 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13200, loss[loss=0.1245, beats_loss=0.009988, ecapa_loss=0.0001342, whisper_loss=0.1132, over 22795.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.0001393, whisper_loss=0.08898, over 3791559.73 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:11:41,562 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5169750.0, ans=0.125 2024-08-21 09:12:35,999 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-21 09:12:55,808 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5170050.0, ans=0.2 2024-08-21 09:13:10,008 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-21 09:13:10,257 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5170050.0, ans=0.125 2024-08-21 09:13:12,128 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 09:13:15,077 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-21 09:13:35,210 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5170150.0, ans=0.125 2024-08-21 09:13:41,443 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13250, loss[loss=0.1113, beats_loss=0.01061, ecapa_loss=0.0001219, whisper_loss=0.09943, over 23517.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001392, whisper_loss=0.08921, over 3788821.01 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:13:44,304 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 09:14:18,422 INFO [train_multi_KD3.py:845] (1/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 09:14:18,712 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5170350.0, ans=0.125 2024-08-21 09:14:26,019 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 26 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-21 09:14:40,180 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:14:55,788 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5170450.0, ans=0.2 2024-08-21 09:15:16,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.287e+01 2.552e+01 2.926e+01 1.195e+02, threshold=5.104e+01, percent-clipped=1.0 2024-08-21 09:15:21,801 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2024-08-21 09:15:22,022 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-08-21 09:15:40,538 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-21 09:15:44,998 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-21 09:15:53,591 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13300, loss[loss=0.1138, beats_loss=0.009129, ecapa_loss=0.0001214, whisper_loss=0.1034, over 17665.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001387, whisper_loss=0.08965, over 3835923.55 frames. ], batch size: 67, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:15:59,101 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5170750.0, ans=0.125 2024-08-21 09:16:00,969 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5170750.0, ans=0.0 2024-08-21 09:16:20,752 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-21 09:16:22,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5170850.0, ans=0.125 2024-08-21 09:16:23,274 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.64 vs. limit=10.0 2024-08-21 09:16:28,781 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5170850.0, ans=0.125 2024-08-21 09:16:43,191 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 9 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 09:16:46,518 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5170950.0, ans=0.125 2024-08-21 09:16:46,808 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-21 09:17:15,217 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5171050.0, ans=0.0 2024-08-21 09:17:15,396 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.95 vs. limit=10.0 2024-08-21 09:17:17,665 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-21 09:17:45,129 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-21 09:17:52,245 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 09:17:52,511 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5171150.0, ans=0.0 2024-08-21 09:17:54,537 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5171150.0, ans=0.0 2024-08-21 09:17:58,194 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13350, loss[loss=0.0919, beats_loss=0.01104, ecapa_loss=0.000158, whisper_loss=0.07929, over 18909.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.08901, over 3815578.53 frames. ], batch size: 79, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:18:13,388 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 09:18:30,358 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5171350.0, ans=0.125 2024-08-21 09:18:43,137 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-08-21 09:19:19,654 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=5171550.0, ans=0.2 2024-08-21 09:19:23,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.340e+01 2.564e+01 2.896e+01 2.938e+02, threshold=5.128e+01, percent-clipped=2.0 2024-08-21 09:19:25,882 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 09:19:38,749 WARNING [optim.py:496] (1/4) Scaling gradients by 0.09627310186624527, model_norm_threshold=51.2801628112793 2024-08-21 09:19:38,906 INFO [optim.py:564] (1/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.670e+04, grad_sumsq=3.670e+04, orig_rms_sq=1.000e+00 2024-08-21 09:19:39,481 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5171650.0, ans=0.125 2024-08-21 09:19:48,400 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5171650.0, ans=0.125 2024-08-21 09:19:58,034 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5171750.0, ans=0.0 2024-08-21 09:19:59,572 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13400, loss[loss=0.09931, beats_loss=0.01143, ecapa_loss=0.0001506, whisper_loss=0.08637, over 21412.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01032, ecapa_loss=0.0001393, whisper_loss=0.08978, over 3784890.70 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:20:57,624 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5171950.0, ans=0.125 2024-08-21 09:21:21,965 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 09:21:52,710 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5172150.0, ans=0.1 2024-08-21 09:22:00,638 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13450, loss[loss=0.1098, beats_loss=0.008353, ecapa_loss=0.0001515, whisper_loss=0.09997, over 22404.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01021, ecapa_loss=0.00014, whisper_loss=0.08999, over 3763324.84 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:22:03,830 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5172250.0, ans=0.07 2024-08-21 09:22:50,278 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-21 09:22:51,036 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-21 09:23:19,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.314e+01 2.499e+01 2.868e+01 5.327e+02, threshold=4.997e+01, percent-clipped=2.0 2024-08-21 09:23:22,109 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-21 09:23:24,003 INFO [train_multi_KD3.py:845] (1/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 09:23:54,378 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13500, loss[loss=0.07218, beats_loss=0.01344, ecapa_loss=0.0001209, whisper_loss=0.05753, over 15552.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.08965, over 3796621.33 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:24:34,021 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5172850.0, ans=0.1 2024-08-21 09:24:57,954 INFO [train_multi_KD3.py:845] (1/4) A total of 50 cuts. 19 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-21 09:25:08,313 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 28 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-21 09:25:31,778 INFO [train_multi_KD3.py:845] (1/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 09:25:33,951 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5173150.0, ans=0.1 2024-08-21 09:25:34,057 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5173150.0, ans=0.2 2024-08-21 09:25:38,720 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5173150.0, ans=0.125 2024-08-21 09:25:38,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5173150.0, ans=0.125 2024-08-21 09:25:52,233 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13550, loss[loss=0.07359, beats_loss=0.01431, ecapa_loss=0.0001382, whisper_loss=0.0579, over 19547.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.08948, over 3802678.65 frames. ], batch size: 85, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:25:52,846 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5173250.0, ans=0.04949747468305833 2024-08-21 09:26:04,393 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-21 09:26:29,377 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5173350.0, ans=0.0 2024-08-21 09:26:31,825 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5173350.0, ans=0.125 2024-08-21 09:26:31,834 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5173350.0, ans=0.2 2024-08-21 09:26:41,519 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 09:27:07,135 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 09:27:16,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.212e+01 2.430e+01 2.813e+01 4.061e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:27:24,687 INFO [train_multi_KD3.py:845] (1/4) A total of 96 cuts. 24 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-21 09:27:41,423 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5173650.0, ans=0.125 2024-08-21 09:27:53,912 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13600, loss[loss=0.08533, beats_loss=0.01056, ecapa_loss=0.0001495, whisper_loss=0.07327, over 15622.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.0001402, whisper_loss=0.08955, over 3821689.32 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:27:59,544 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 09:28:07,657 INFO [train_multi_KD3.py:845] (1/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 09:29:15,205 INFO [train_multi_KD3.py:845] (1/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-21 09:29:40,009 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 09:29:59,834 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13650, loss[loss=0.1064, beats_loss=0.006945, ecapa_loss=0.0001792, whisper_loss=0.09764, over 12723.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01026, ecapa_loss=0.0001392, whisper_loss=0.09009, over 3824061.65 frames. ], batch size: 51, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:30:00,758 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5174250.0, ans=0.125 2024-08-21 09:30:02,639 INFO [train_multi_KD3.py:845] (1/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 09:30:50,434 INFO [train_multi_KD3.py:845] (1/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 09:31:05,407 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.72 vs. limit=10.0 2024-08-21 09:31:26,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.278e+01 2.472e+01 2.664e+01 8.830e+01, threshold=4.945e+01, percent-clipped=1.0 2024-08-21 09:31:44,215 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-21 09:32:04,218 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13700, loss[loss=0.09902, beats_loss=0.01131, ecapa_loss=0.0001627, whisper_loss=0.08608, over 20636.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.08971, over 3826443.65 frames. ], batch size: 85, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:32:21,470 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5174750.0, ans=0.0 2024-08-21 09:32:46,666 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:33:38,053 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5175150.0, ans=0.2 2024-08-21 09:33:47,779 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 09:34:04,753 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13750, loss[loss=0.1186, beats_loss=0.008679, ecapa_loss=0.0001401, whisper_loss=0.1085, over 22593.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001386, whisper_loss=0.08924, over 3850188.92 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:34:05,367 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5175250.0, ans=0.0 2024-08-21 09:34:08,489 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5175250.0, ans=0.1 2024-08-21 09:35:03,707 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 19 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-21 09:35:27,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.292e+01 2.541e+01 2.786e+01 7.539e+01, threshold=5.082e+01, percent-clipped=3.0 2024-08-21 09:35:27,978 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 09:35:33,371 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2024-08-21 09:35:46,032 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 09:35:46,415 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5175650.0, ans=0.1 2024-08-21 09:36:06,924 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13800, loss[loss=0.1221, beats_loss=0.00922, ecapa_loss=0.0001589, whisper_loss=0.1112, over 18404.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.08994, over 3825056.18 frames. ], batch size: 75, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:36:17,320 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5175750.0, ans=0.0 2024-08-21 09:36:31,585 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5175850.0, ans=0.035 2024-08-21 09:36:39,384 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.74 vs. limit=22.5 2024-08-21 09:37:08,794 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5175950.0, ans=0.2 2024-08-21 09:37:38,125 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5176050.0, ans=0.0 2024-08-21 09:37:49,399 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 09:37:59,147 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2024-08-21 09:38:01,931 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-21 09:38:06,335 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-21 09:38:13,670 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5176150.0, ans=0.125 2024-08-21 09:38:21,657 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13850, loss[loss=0.07564, beats_loss=0.01247, ecapa_loss=0.0001386, whisper_loss=0.06179, over 15155.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.08937, over 3808143.63 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:38:43,803 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-21 09:38:48,375 INFO [train_multi_KD3.py:845] (1/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 09:39:18,150 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5176450.0, ans=0.2 2024-08-21 09:39:28,868 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=5176450.0, ans=10.0 2024-08-21 09:39:50,238 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5176550.0, ans=0.125 2024-08-21 09:39:58,103 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.312e+01 2.430e+01 2.795e+01 3.774e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:39:58,864 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5176550.0, ans=0.125 2024-08-21 09:40:13,491 INFO [train_multi_KD3.py:845] (1/4) A total of 78 cuts. 31 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-21 09:40:33,012 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13900, loss[loss=0.1037, beats_loss=0.01269, ecapa_loss=0.0001309, whisper_loss=0.08972, over 16261.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.08876, over 3798516.80 frames. ], batch size: 67, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:40:39,641 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5176750.0, ans=0.125 2024-08-21 09:40:39,662 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5176750.0, ans=0.125 2024-08-21 09:40:58,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5176850.0, ans=0.125 2024-08-21 09:41:08,339 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5176850.0, ans=0.125 2024-08-21 09:41:15,983 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-21 09:41:16,786 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 09:42:31,822 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 13950, loss[loss=0.08588, beats_loss=0.01128, ecapa_loss=0.0001094, whisper_loss=0.07351, over 18968.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.08939, over 3788489.46 frames. ], batch size: 73, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:42:45,722 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5177250.0, ans=0.2 2024-08-21 09:42:48,666 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=5177250.0, ans=0.5 2024-08-21 09:42:52,800 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5177350.0, ans=0.04949747468305833 2024-08-21 09:43:11,985 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 09:43:24,260 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 09:43:36,943 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 19 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-21 09:43:58,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.301e+01 2.643e+01 2.947e+01 4.607e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-21 09:44:13,053 INFO [train_multi_KD3.py:845] (1/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 09:44:31,667 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14000, loss[loss=0.1067, beats_loss=0.0108, ecapa_loss=0.0001395, whisper_loss=0.09449, over 22123.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001395, whisper_loss=0.08961, over 3799224.56 frames. ], batch size: 93, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:44:40,423 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=12.0 2024-08-21 09:45:24,053 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=12.0 2024-08-21 09:45:59,002 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-21 09:46:23,927 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14050, loss[loss=0.1005, beats_loss=0.01333, ecapa_loss=0.0001177, whisper_loss=0.08603, over 17268.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08933, over 3797771.59 frames. ], batch size: 69, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:46:39,728 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5178250.0, ans=0.0 2024-08-21 09:46:50,092 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5178350.0, ans=0.0 2024-08-21 09:46:57,449 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=5178350.0, ans=0.05 2024-08-21 09:47:27,507 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5178450.0, ans=0.1 2024-08-21 09:47:31,517 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-21 09:47:34,107 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5178550.0, ans=0.125 2024-08-21 09:47:42,455 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5178550.0, ans=0.125 2024-08-21 09:47:48,873 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.318e+01 2.539e+01 2.809e+01 4.112e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-21 09:48:08,770 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 39 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 09:48:16,453 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14100, loss[loss=0.1058, beats_loss=0.01025, ecapa_loss=0.0001346, whisper_loss=0.09418, over 20736.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01031, ecapa_loss=0.0001406, whisper_loss=0.09003, over 3822840.18 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:48:26,414 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5178750.0, ans=0.125 2024-08-21 09:49:00,695 INFO [train_multi_KD3.py:845] (1/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 09:49:06,524 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5178950.0, ans=0.0 2024-08-21 09:49:20,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=5179050.0, ans=0.025 2024-08-21 09:49:42,076 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 09:49:51,765 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5179150.0, ans=0.125 2024-08-21 09:50:01,587 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14150, loss[loss=0.08687, beats_loss=0.01176, ecapa_loss=0.0001383, whisper_loss=0.07373, over 22417.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001402, whisper_loss=0.09021, over 3837961.49 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:50:05,166 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5179250.0, ans=0.125 2024-08-21 09:50:33,292 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5179350.0, ans=0.2 2024-08-21 09:50:40,699 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=12.0 2024-08-21 09:50:51,238 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-21 09:51:19,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.247e+01 2.512e+01 2.809e+01 5.073e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 09:51:23,759 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-08-21 09:51:29,186 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 12 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 09:51:41,156 INFO [train_multi_KD3.py:845] (1/4) A total of 85 cuts. 21 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-21 09:51:46,667 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5179650.0, ans=0.0 2024-08-21 09:51:49,010 INFO [train_multi_KD3.py:845] (1/4) A total of 52 cuts. 24 from LS+wenet, 10 from Vox, 18 fro AS 2024-08-21 09:51:52,968 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14200, loss[loss=0.106, beats_loss=0.009673, ecapa_loss=0.0001803, whisper_loss=0.09448, over 18474.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001397, whisper_loss=0.09015, over 3822026.78 frames. ], batch size: 75, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:51:53,209 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 09:51:56,066 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-21 09:52:12,465 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5179750.0, ans=0.125 2024-08-21 09:52:24,280 INFO [train_multi_KD3.py:845] (1/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 09:52:42,668 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5179950.0, ans=0.0 2024-08-21 09:53:12,596 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5180050.0, ans=0.125 2024-08-21 09:53:25,523 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 22 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-21 09:53:34,650 INFO [train_multi_KD3.py:845] (1/4) A total of 89 cuts. 18 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-21 09:53:56,775 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14250, loss[loss=0.1145, beats_loss=0.009158, ecapa_loss=0.0001421, whisper_loss=0.1039, over 22629.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001381, whisper_loss=0.09025, over 3824747.37 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:55:09,444 INFO [train_multi_KD3.py:845] (1/4) A total of 79 cuts. 28 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-21 09:55:24,254 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5180550.0, ans=0.125 2024-08-21 09:55:26,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.191e+01 2.440e+01 2.689e+01 6.038e+01, threshold=4.881e+01, percent-clipped=1.0 2024-08-21 09:55:55,632 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5180650.0, ans=0.125 2024-08-21 09:56:03,526 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14300, loss[loss=0.1146, beats_loss=0.01117, ecapa_loss=0.0001198, whisper_loss=0.1023, over 22255.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001374, whisper_loss=0.09094, over 3821605.20 frames. ], batch size: 88, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:56:11,020 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 09:56:33,953 INFO [train_multi_KD3.py:845] (1/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 09:56:36,321 INFO [train_multi_KD3.py:845] (1/4) A total of 84 cuts. 21 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-21 09:57:15,100 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2024-08-21 09:58:09,806 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14350, loss[loss=0.1083, beats_loss=0.007571, ecapa_loss=0.0001164, whisper_loss=0.0996, over 14167.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01024, ecapa_loss=0.0001367, whisper_loss=0.09101, over 3817748.71 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:58:17,481 INFO [train_multi_KD3.py:845] (1/4) A total of 51 cuts. 19 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-21 09:58:41,886 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5181350.0, ans=0.125 2024-08-21 09:58:45,037 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5181350.0, ans=0.0 2024-08-21 09:58:49,594 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-21 09:59:29,271 INFO [train_multi_KD3.py:845] (1/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-21 09:59:34,304 INFO [train_multi_KD3.py:845] (1/4) A total of 93 cuts. 35 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-21 09:59:35,120 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-21 09:59:44,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.249e+01 2.480e+01 2.767e+01 3.884e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-21 09:59:55,478 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5181650.0, ans=0.125 2024-08-21 10:00:09,895 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5181650.0, ans=0.125 2024-08-21 10:00:19,254 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14400, loss[loss=0.06579, beats_loss=0.01152, ecapa_loss=0.0001027, whisper_loss=0.05324, over 13423.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01022, ecapa_loss=0.000138, whisper_loss=0.09104, over 3780338.97 frames. ], batch size: 49, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:00:36,514 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=5181750.0, ans=6.0 2024-08-21 10:00:40,836 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 10:00:57,370 INFO [train_multi_KD3.py:845] (1/4) A total of 67 cuts. 15 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-21 10:01:07,693 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5181850.0, ans=0.1 2024-08-21 10:01:15,936 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 10:01:47,476 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5182050.0, ans=0.125 2024-08-21 10:01:57,328 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5182050.0, ans=0.2 2024-08-21 10:02:10,076 INFO [train_multi_KD3.py:845] (1/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 10:02:10,352 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5182150.0, ans=0.125 2024-08-21 10:02:24,279 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2024-08-21 10:02:28,991 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14450, loss[loss=0.1001, beats_loss=0.009733, ecapa_loss=0.0001336, whisper_loss=0.08906, over 19173.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01022, ecapa_loss=0.0001385, whisper_loss=0.09078, over 3769457.89 frames. ], batch size: 75, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:02:40,485 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5182250.0, ans=0.125 2024-08-21 10:02:45,545 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=5182250.0, ans=10.0 2024-08-21 10:02:51,067 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5182250.0, ans=0.07 2024-08-21 10:03:02,111 INFO [train_multi_KD3.py:845] (1/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 10:03:34,392 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2024-08-21 10:03:38,890 INFO [train_multi_KD3.py:845] (1/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 10:03:58,488 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5182550.0, ans=0.1 2024-08-21 10:04:01,188 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5182550.0, ans=0.125 2024-08-21 10:04:03,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.252e+01 2.493e+01 2.789e+01 4.722e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-21 10:04:03,460 INFO [train_multi_KD3.py:845] (1/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-21 10:04:35,577 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5182650.0, ans=0.1 2024-08-21 10:04:37,991 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5182750.0, ans=0.0 2024-08-21 10:04:40,401 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14500, loss[loss=0.1096, beats_loss=0.009924, ecapa_loss=0.0001387, whisper_loss=0.09825, over 13166.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01024, ecapa_loss=0.0001377, whisper_loss=0.09127, over 3777213.00 frames. ], batch size: 51, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:04:40,909 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5182750.0, ans=0.0 2024-08-21 10:04:44,232 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.08 vs. limit=10.0 2024-08-21 10:04:49,018 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5182750.0, ans=0.125 2024-08-21 10:05:00,072 INFO [train_multi_KD3.py:845] (1/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 10:05:34,773 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2024-08-21 10:05:37,502 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5182950.0, ans=0.125 2024-08-21 10:05:39,802 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-21 10:06:11,174 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5183050.0, ans=0.125 2024-08-21 10:06:21,875 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5183150.0, ans=0.0 2024-08-21 10:06:48,199 INFO [scaling.py:1120] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 10:06:50,541 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14550, loss[loss=0.1104, beats_loss=0.01073, ecapa_loss=0.0001364, whisper_loss=0.09832, over 22017.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01018, ecapa_loss=0.0001383, whisper_loss=0.09171, over 3823261.74 frames. ], batch size: 86, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:07:28,197 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-21 10:07:41,016 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5183450.0, ans=0.0 2024-08-21 10:08:25,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.316e+01 2.552e+01 2.879e+01 5.154e+01, threshold=5.103e+01, percent-clipped=1.0 2024-08-21 10:08:57,125 INFO [train_multi_KD3.py:845] (1/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 10:09:01,747 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14600, loss[loss=0.1091, beats_loss=0.009485, ecapa_loss=0.0001317, whisper_loss=0.09833, over 22139.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01019, ecapa_loss=0.0001389, whisper_loss=0.09103, over 3858558.76 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:09:13,863 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5183750.0, ans=0.1 2024-08-21 10:10:03,193 INFO [train_multi_KD3.py:845] (1/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-21 10:10:38,593 INFO [train_multi_KD3.py:845] (1/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 10:10:53,969 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=12.0 2024-08-21 10:10:55,940 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5184150.0, ans=0.0 2024-08-21 10:11:03,539 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14650, loss[loss=0.1113, beats_loss=0.01078, ecapa_loss=0.0001368, whisper_loss=0.0992, over 23612.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01024, ecapa_loss=0.0001378, whisper_loss=0.09101, over 3892878.24 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:11:06,000 INFO [train_multi_KD3.py:845] (1/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-21 10:11:25,867 INFO [train_multi_KD3.py:845] (1/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-21 10:11:43,504 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5184350.0, ans=0.125 2024-08-21 10:12:00,227 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 10:12:26,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.280e+01 2.543e+01 2.836e+01 3.661e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-21 10:12:26,989 INFO [train_multi_KD3.py:845] (1/4) A total of 53 cuts. 13 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 10:12:34,758 INFO [train_multi_KD3.py:845] (1/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-21 10:12:42,672 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5184650.0, ans=0.035 2024-08-21 10:12:45,871 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5184650.0, ans=0.0 2024-08-21 10:12:54,492 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 21 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 10:13:01,677 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14700, loss[loss=0.1027, beats_loss=0.01135, ecapa_loss=0.0001281, whisper_loss=0.09012, over 22802.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001375, whisper_loss=0.08994, over 3853294.85 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:13:16,870 INFO [train_multi_KD3.py:845] (1/4) A total of 75 cuts. 29 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-21 10:14:54,845 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-21 10:15:04,797 INFO [train_multi_KD3.py:1117] (1/4) Epoch 35, batch 14750, loss[loss=0.1239, beats_loss=0.01087, ecapa_loss=0.0001636, whisper_loss=0.1114, over 21521.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.000138, whisper_loss=0.0896, over 3859298.78 frames. ], batch size: 88, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:15:10,469 INFO [train_multi_KD3.py:845] (1/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 10:15:43,001 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5185350.0, ans=0.125 2024-08-21 10:16:04,881 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5185450.0, ans=0.2 2024-08-21 10:16:07,786 INFO [scaling.py:1024] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-21 10:16:19,912 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5185450.0, ans=0.125 2024-08-21 10:16:39,170 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.269e+01 2.538e+01 2.783e+01 3.650e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-21 10:16:52,792 INFO [scaling.py:214] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5185650.0, ans=0.0 2024-08-21 10:16:57,151 INFO [train_multi_KD3.py:1466] (1/4) Done!